Predicting what lies beneath four kilometers of ocean water without ever visiting it is almost unsettling. Not a drill. No core sample emerged from the shadows dripping. All you need is data and an algorithm that has been trained to identify trends in the traces left by earlier expeditions. It sounds bold. Most likely, it is. Quietly, though, it’s working.
In recent years, an increasing number of marine geotechnical researchers have been feeding machine learning models with decades’ worth of ocean drilling data, asking them to learn the characteristics of the seabed in areas that no ship has sampled. This was accomplished in a 2025 study by Jungmin Yun and associates that was published in Scientific Reports.

The study created a data-driven framework to forecast five important characteristics of deep-sea sediments: porosity, grain density, compressional wave velocity, thermal conductivity, and calcium carbonate content. These measurements are not unimportant. They assess the seabed’s stability, the way heat and sound pass through it, and its potential suitability for anchoring offshore structures. XGBoost, a gradient boosting technique that has become somewhat of a workhorse across scientific disciplines, consistently outperformed the other five machine learning algorithms tested in the study.
The accuracy wasn’t the only thing that made the outcome remarkable. The model learned to prioritize it. The researchers identified the input variables that drove each prediction using a technique known as SHAP, or Shapley Additive Explanations. Compressional wave velocity and depth continued to rise to the top. Once you sit with that, there’s a certain logic to it. The sediment has been heated, compressed, and changed by time and pressure to a greater extent the deeper it is. In contrast, wave velocity is a type of acoustic fingerprint that provides information about the density, fluid saturation, and tight packing of the sediment. The model found and quantified what seasoned geologists would likely tell you anyhow, but it did so more quickly.
It’s important to recognize the larger context here. It is difficult to overestimate the cost of traditional deep-sea sampling techniques like piston coring, gravity coring, and ocean drilling programs. Millions of dollars can be spent on a single research expedition that returns data from a small number of locations. That is not replaced by machine learning. However, it fills in the gaps, which is a useful function. It examines what has been measured and extrapolates to areas that the drills have not yet reached with quantified uncertainty.
Around the same time, a different study from China’s Second Institute of Oceanography tackled a similar issue by using LSTM, XGBoost, ARIMA, and SVR models trained on mooring data from the Western Pacific to predict actual near-seabed ocean currents rather than sediment properties. Deep-sea mining was the backdrop, an industry that is becoming more and more commercial and raising significant environmental concerns in the process. Because sediment plumes created by mining equipment follow the water and where they go determines which benthic ecosystems are affected, accurate current forecasting is crucial. Making the correct prediction 96 hours in advance is really important.
Neural networks informed by physics may be the next significant development. In contrast to models that are solely data-driven, PINNs incorporate physical laws—such as pore-water pressure and effective stress—directly into the learning process, guaranteeing that the results remain rooted in geological reality even in the case of sparse training data. The appeal is clear. The amount of data available will always limit a model that can only learn from it. A person who is also familiar with physics is able to reason beyond it, at least somewhat.
Observing the growth of this field gives me the impression that the bottleneck in ocean science is about to change from data collection to data interpretation. An enormous, underutilized archive has been left behind by the drilling programs of the last fifty years. These models are starting to comprehend it.
