Downhole drilling presents a compelling case for self-supervised pretraining: surface-level telemetry streams continuously at 1 Hz across multiple sensors, yet ground-truth downhole measurements remain episodic and expensive to acquire. This label scarcity-abundance asymmetry renders standard supervised learning inefficient, motivating exploration of reconstruction-based pretraining paradigms. The authors conduct the first rigorous empirical investigation of MAE pretraining for this domain, leveraging 3.5 million timesteps from Utah FORGE geothermal wells to predict Total Mud Volume—a critical operational metric.
The experimental design employs full-factorial ablation across 72 MAE configurations, systematically varying architectural hyperparameters including encoder depth, latent dimension, masking ratios (10-90%), and patch lengths. Downstream task performance is benchmarked against LSTM and GRU baselines trained end-to-end on labeled data. The methodology isolates individual design choices through correlation analysis, revealing latent dimensionality as the dominant factor (Pearson r = -0.59 with test MAE), while masking ratio exhibits negligible correlation—a counterintuitive result attributed to the high temporal autocorrelation inherent in 1 Hz drilling data.
Optimal MAE configurations achieve 19.8% relative error reduction versus supervised GRU, though they trail the best LSTM variant by 6.4%. This performance gap suggests that while self-supervised pretraining effectively leverages unlabeled surface data, sequence modeling capacity remains paramount. The findings establish MAE as a viable alternative when labeled data is severely constrained, with practical implications for operational scenarios where downhole instrumentation is sparse or unreliable.