Skip to main content

← Back to Journal

Architecture: Physics-Informed Hybrid ML Models

Dec 16, 2025Machine LearningArchitecture12 min read

This entry documents our hybrid physics-ML architecture, inspired by Raissi et al. (2019) on Physics-Informed Neural Networks. The core insight: physics constraints prevent unreasonable predictions while ML corrects systematic biases.

The Hybrid Formula

y_final = y_physics + f_ML(x, y_physics)

Where y_physics is the physics-based baseline prediction (dispersion relation, shoaling, refraction) and f_ML is the ML residual correction model that learns systematic biases from historical data.

Why Hybrid?

  • Physics constraints prevent unreasonable predictions (negative wave heights, impossible arrival times)
  • ML corrects systematic biases that pure physics models miss (local bathymetry effects, station-specific errors)
  • Better extrapolation to unseen conditions than pure ML
  • Interpretable decomposition: users can see physics vs. correction contributions

Model Components

Our ensemble consists of five specialized components, each targeting a specific prediction task:

  • Binary Classification (XGBoost): Surfability prediction with 300 estimators, max_depth=6
  • Wave Height Regression (Random Forest): Residual correction targeting RMSE < 0.15m
  • Wave Period Regression (Random Forest): Period prediction targeting RMSE < 0.5s
  • Quality Score (Ordinal Classifier): 5-class surf quality from Flat to Excellent
  • Uncertainty Estimation: Ensemble variance from tree prediction spread

Feature Engineering: 83 Hybrid Features

Features span eight categories, combining physical parameters with derived ML features:

  • Source Wave (8): Hs, Tp, direction, steepness, spectrum width
  • Propagation (9): Distance, bearing, alignment, travel time
  • Bathymetry (4): Mean/min depth, gradient, shallow crossings
  • Physics (5): Attenuation coefficient, period survival, exposure index
  • Local Conditions (4): Shore wind speed and direction
  • Tidal (12): Height, phase, spring/neap, currents, storm surge
  • Temporal (3): Hour, day of week, month
  • Spectral Partitions (28): Multi-modal sea state features (NEW)

Optimal Hyperparameters

n_estimators: 300
max_depth: 6
physics_weight: 0.3
min_samples_split: 5
min_samples_leaf: 2

Cross-Validation Methodology

Time-series cross-validation prevents future-to-past data leakage:

  • Time-Split CV with expanding window
  • 24-hour gap period between train/test to prevent temporal leakage
  • Lead-time stratified analysis: 6h, 12h, 18h, 24h, 36h, 48h horizons
  • Performance tracked per buoy station for regional calibration

References

  • Raissi, M., Perdikaris, P., & Karniadakis, G.E. (2019). Physics-Informed Neural Networks. Journal of Computational Physics.
  • Karniadakis, G.E. et al. (2021). Physics-Informed Machine Learning. Nature Reviews Physics.
  • Gneiting, T. & Raftery, A.E. (2007). Probabilistic Forecasting. Annual Review of Statistics.
Science Journal | PelagicLabs