Seismic Horizon Detection with Neural Networks, part 2

These features (and many more) are a key part of our framework for seismic interpretation seismiQB: besides, it has a rich library of augmentations (both mathematical and geological) and tools for processing. You can learn more about it on our GitHub; we are also planning on a separate article to describe it in detail.

From the machine learning perspective, the task of horizon picking can be seen as either binary segmentation, multiclass segmentation, or regression. Each one of them has its pros and cons:

binary segmentation directly models the horizon surface, which is convenient and easy to work with. Yet, it outputs a mask that must be converted into a depth-map
multiclass segmentation splits the whole volume into two parts: above the tracked reflection and below it; it has the same problems with mask conversion, as the binary segmentation
regression allows us to directly predict depths of the horizon at each point; unfortunately, it is hard to train such models as they are very unstable and introduce a lot of jitter into predictions

Model inputs for binary segmentation: seismic image and segmentation mask

In this article, I will focus on the first approach, binary segmentation. Our research indicates that it gives the best results, compared to the other two, while the additional complexity of converting the mask to a depth map is negligible. This approach also allows us to utilize a wealth of papers on semantic segmentation and edge detection, ranging from simple UNet to very sophisticated combinations of DeepLab, CPD, and HRNet-OCR.

The rest of this section dives deep into technical details (and even some code). Skip to the next chapter if you are not ready for it!

The vast majority of the modern segmentation networks employ the encoder-decoder architecture: it is also known as the sandglass architecture (not to be mistaken with the sandglass block) and originates from the autoencoders, widely studied in the previous decades. When talking about this model design, it is impossible not to mention the UNet paper, which used it to great success in the field of biomedicine.

UNet is a great example of a simple, yet effective and very popular neural network. On a higher level of abstraction it consists of:

encoder, which passes the tensor over multiple blocks and sequentially reduces its spatial dimensionality. It also saves outputs from its stages for later use
embedding, that thoroughly processes the output of encoder: that is the place of various pyramid-like modules
decoder, that restores the spatial shape of the tensor, applies all the more operations like Conv-BN-ReLU, and makes use of stored tensors from the encoder

On this level of abstraction, one can construct hundreds of completely different neural networks. By changing operational blocks from, say, vanilla Conv-BN-ReLU to a block with residual connections, or with dense connections, or even both, we can modify the behavior of our model to suit current needs. The same is also true for downsampling and upsampling strategies as well: it is easy to seamlessly swap, say, bilinear interpolation to a transposed convolution. This design also includes all of the DeepLab variations, allowing us to effortlessly implement recent papers in our research.

As with a lot of other patterns in programming, our (arguably more abstract) view on the encoder-decoder concept allows for clean and concise implementation. Our library, BatchFlow, allows us to define such models with a clear and straightforward dictionary:

Encoder-Decoder with Atrous Spatial Pyramid Pooling

By swapping the DefaultBlock (which implements exactly Conv-BN-ReLU) in the code above with another module, for example, with ResBlock, one can drastically change the model’s performance. This method of model configuration lets us effortlessly explore a huge number of architectures to get the best one; by using it we’ve obtained the best model for horizon detection. We will shortly release a separate article on our BatchFlow library, that extensively explains all of its features for model creation, research, and data generation. Stay tuned!

We took our time to thoroughly revisit the necessary concepts: metrics, data generation, model creation, and research. That said, we are yet to discuss the exact formulation of the horizon detection task from an ML standpoint; specifically, what should be used as train and test datasets. The following options are available:

train on multiple cubes, inference on completely unseen data
train on part of the cube, inference on the rest of it
train on a few slices of data, distributed evenly on the cube, inference on entire seismic cube

Let’s discuss the positive and negative sides of each of them. The first one is looking like the most obvious and most promising: yet, it is very hard to apply in real life. By training on a separate set of cubes, we will create a model that detects some of the horizons on the unseen data. But in reality, geology experts need to track one very specific reflection, not the random ones (usually, the easiest) that are given by the model. Thus, this model can be utilized to retrieve a huge number of labeled surfaces, some of which are useful, but it can’t be the sole model for horizon detection.

The other two options are very similar; the difference is, in the last approach our data to learn from consists of just a few slides — less than 0.5% of the total area. Using the information from the same cube for train and test significantly boosts the quality; it also gives our model a prior on which precisely horizons to track, eliminating the ambiguity of the previous approach. To understand which slides to label, we’ve developed an algorithm for assessing the geological hardness of the field: in short, we want to get more human labeling in the hard places, whereas more trivial locations don’t require that much effort and involvement from the specialists.

Example of a horizon carcass

That is exactly how we propose to detect horizons; in short, the whole pipeline looks like this:

That’s it! The only step of the list we are yet to cover is the assemble of predictions: this is a procedure of combining individual patches into the horizon, and it is also a part of our seismiQB library.

This short pipeline allows us to track reflection surfaces with high quality and fidelity and takes less than an hour even on the largest fields.

Note that it requires only a few slides to start the process: by carefully choosing the task setting, we avoided (or at least mitigated) the data dependency and use only a handful of labels to guide the model on the entire field.

As we can see from the previous image, the produced horizon covers a lot of the cube area…but not all. There are holes in the predicted plane, and our neural network design does not guarantee their absence. Such holes are unacceptable in the resulting horizon and must be somehow stitched.

To do so, we apply one more neural network. The difference to the previous model is:

it uses the already-labeled surface as the training dataset, not only the carcass
its final layer is constrained to always output something (via softmax along the depth axis)

This model is looking very like and in the spirit of currently trending (for the last few years) student-teacher neural networks and allows us to create horizons of not only high quality and fidelity, but with 99%+ coverage. Training an additional model and using it for inference adds an overhead to the detection pipeline, but we can mitigate it by carefully choosing places to apply the extension model (only where it is actually needed), resulting in less than 30 minutes for postprocessing overall.

To make sure that our technology is bullet-proof, we applied it to an enormous dataset of just shy of a hundred horizons on twenty-something fields, developed in the last 20 years. Roughly, for each of them we did the following:

create a carcass from a seismic cube: its total area is less than 0.5% of field area
keep only the carcass from a hand-labeled horizon
train detection neural network on the carcass, inference on the whole cube
train postprocessing neural network on the prediction from the previous step, inference to stitch all the holes
compare the original (hand-labeled) surface to the one produced by our models

Each pair of cube and horizon is, essentially, a separate dataset, and when performing research on a huge amount of datasets it is hard to analyze the results. Reporting average values of key metrics is of little use, as it hides the specificity of each of the surfaces; on the other hand, manually inspecting them is practically impossible.

We stand on the middle ground of choosing a pre-defined set of complex horizons and assess their quality in detail. Each of the model-generated surfaces is compared to the hand-labeled one with a number of metrics, and in the vast majority of cases, it is easy to notice the improvements in the detected horizons. This conclusion is confirmed by our geology experts, that carefully examined each slice of the ML tracking.

Footer