HOSVD + GPR: Parametric Surrogate for Turbulent Jet Flames

Parametric interpolation of DLR turbulent jet diffusion flame using HOSVD + GPR

Introduction

Fast predictions of thermochemical variables is one of the necessary steps in making a functional digital twin for reactive flowsystems. High fidelity CFD simulations are at the core of the study and design of combustion systems, but are often too slow and expensive to generate a full spectrum of the behaviour of the system for a wide spectrum of operating conditions.

This page shows a method developed by out group able to learn from a handful of high fidelity simulation and predict the behaviour of the system for unseen conditions. The framework is to use Higher Order Singular Value Decomposition to reduce the dimensionality of the simulations and Gaussian Process Regression to interpolate the behaviour for new combinations of operating parameters.

This work is the first application of this technique to reactive flows, a far more complex case than classical fluid dynamics due to the steep gradients of various quantities and localized structures typical of specific species such as the radicals.

Dataset

The dataset used for this page was generated within the group and both a full description and how to generated from scratch can be found at: LINK. The data describes a turbulent diffusive jet flame DLR, a standard case for non premixed combustion. Fuel is a Methane - Hydrogen blend injected into a coaxial air flow. Simulaiton and RANS k-$\epsilon$ with EDC model. Numerical results are compared with experimental results, and despite approximating the system with axial symmetry there is good agreement.

A total of 100 CFD cases were generated by varying two parameters: the fuel-jet Reynolds number and the hydrogen mass fraction in the fuel. The parameter space is summarized in the table below:

Parameter Range Steps
Reynolds number (Re) 11,000 – 20,000 10
H₂ mass fraction (X_H₂) 4% – 22% 10

The resulting flow fields — including velocity components, temperature, pressure, and species mass fractions — are organized in a tensor of the form:

\[\mathcal{T} \in \mathbb{R}^{N_{\mathrm{Re}} \times N_{X_{\mathrm{H_2}}} \times N_{\mathrm{variables}} \times N_{x} \times N_{y}}\]

where each dimension corresponds respectively to the Reynolds number samples, hydrogen mass fraction samples, physical variables, and spatial grid points.

An example of the dataset is shown below for Re = 16,000 and H₂ mass fraction of 8%. The figure displays the spatial distribution of temperature (T), methane (CH₄), oxygen (O₂), carbon monoxide (CO), carbon dioxide (CO₂), and water vapor (H₂O) mass fractions over the axial (z) and radial (r) coordinates. The flame structure is clearly visible: peak temperatures and combustion products are concentrated along the reaction zone near the jet axis, while reactants (CH₄ and O₂) are consumed as the flame develops downstream.

CFD snapshot for Re=16000 and H₂ mass fraction 8%

The whole dataset can be dowloaded from our Server at: LINK

Preprocessing

For combustion phoenomena preprocessing is extremely important. Quantities span several orders of magnitude, so without proper scaling both the decomposition and the machine learning models would only track the quantity with the highest order of magnitude. For this problem, standard scaling is applied independently to each variable. The mean $\mu$ and standard deviation $\sigma$ are computed over all training cases and spatial points, and the data is then centered and scaled to unit variance. For a given variable k data is rescaled according to:

\[\tilde{\mathcal{T}}_{i,j,p,q,k} = \frac{\mathcal{T}_{i,j,p,q,k} - \mu_k}{\sigma_k}\]

For what concern the input parameters (Re and $X_{\mathrm{H_2}}$) are normalized to the range $[0, 1]$ via min-max scaling before being fed into the regression model.

Methods

The algorithm works in two stages:

  1. Data is decomposed using HOSVD
  2. Gaussian Process Regression is used to interpolate

The intuitive idea is that HOSVD decomposes data as linear combination of eigenvectors. This means that instead of requiring all “pixels” the data is rewritten as a weighted sum of only a few pictures. The GPR is used to find out how much of each of these pictures is contained in new points that has not been used to build the basis.

Higher Order Singular Value Decomposition (HOSVD)

HOSVD is a generalization of SVD to tensorial data. Rather than flattening the data into a matrix, it extracts dominant patterns along each dimension of the tensor simultaneously. Applied to the dataset tensor:

\[\mathcal{T} \in \mathbb{R}^{N_{\mathrm{Re}} \times N_{X_{\mathrm{H_2}}} \times N_{\mathrm{variables}} \times N_{x} \times N_{y}}\]

HOSVD decomposes it into a core tensor $\mathcal{G}$ and a set of orthogonal factor matrices $\mathbf{U}^{(n)}$, one per dimension:

\[\mathcal{T} = \mathcal{G} \times_{\mathrm{Re}} \mathbf{U}^{\mathrm{Re}} \times_{X_{\mathrm{H_2}}} \mathbf{U}^{X_{\mathrm{H_2}}} \times_{\mathrm{var}} \mathbf{U}^{\mathrm{var}} \times_{x} \mathbf{U}^{x} \times_{y} \mathbf{U}^{y}\]

The factor matrices along the parameter dimensions ($\mathbf{U}^{\mathrm{Re}}$ and $\mathbf{U}^{X_{\mathrm{H_2}}}$) play the role of coefficient matrices, encoding how the flow fields vary with each operating condition independently.

The information about both the singular values and modes is contained for each dimension in teh core tensor $\mathcal{G}$. By contracting the core tensor along the species axis and mapping the resulting dominant patterns back to physical space through the spatial factor matrices $U_x$ and $U_y$. The figure below shows the leading HOSVD spatial modes extracted for the temperature field.

HOSVD spatial modes for temperature

Gaussian Process Regression (GPR)

GPR is a non-parametric Bayesian regression algorithm. Given training points, it updates a prior distribution with the observed data and produces predictions in the form of a Gaussian distribution — yielding both a mean prediction and an associated uncertainty. The covariance between points is measured by a kernel function; here the absolute exponential kernel is used:

\[k(x, x') = \sigma_f^2 \exp\left(-\frac{|x - x'|}{\ell}\right)\]

For a new unseen point $x^*$, the predicted mean and variance are:

\[\mu_{*} = \mathbf{k}_{*}^{\top} \mathbf{K}^{-1} \mathbf{y}, \qquad \sigma_{*}^2 = k(x^{*}, x^{*}) - \mathbf{k}_{*}^{\top} \mathbf{K}^{-1} \mathbf{k}_{*}\]

HOSVD + GPR Framework

A key advantage of HOSVD is that its separable structure allows the two operating parameters (Re and $X_{\mathrm{H_2}}$) to be treated independently. Two separate 1D GPR models are trained, one per parameter axis:

\[\hat{\mathbf{u}}_{\mathrm{Re}} = f_{\mathrm{Re}}\!\left(\mathrm{Re}\right), \qquad \hat{\mathbf{u}}_{X_{\mathrm{H_2}}} = f_{X_{\mathrm{H_2}}}\!\left(X_{\mathrm{H_2}}\right)\]

The interpolated field for a new operating condition is then reconstructed by contracting the core tensor with the predicted factor vectors and rescaling:

\[\hat{\mathcal{T}} = \mathcal{G} \times_{\mathrm{Re}} \hat{\mathbf{u}}_{\mathrm{Re}} \times_{X_{\mathrm{H_2}}} \hat{\mathbf{u}}_{X_{\mathrm{H_2}}} \times_{\mathrm{var}} \mathbf{U}^{\mathrm{var}} \times_{x} \mathbf{U}^{x} \times_{y} \mathbf{U}^{y}\] \[\hat{\mathbf{m}} = \hat{\mathcal{T}} \cdot \boldsymbol{\sigma} + \boldsymbol{\mu}\]

Results

For each operating parameter, the corresponding HOSVD coefficients are interpolated independently with a 1D GPR. The plot below shows the GPR fit (mean ± 2σ) along the Reynolds number axis for the first four modes: black dots are the training coefficients, the gold diamond is the true coefficient at an unseen Re = 13,000, and the red marker is the GPR prediction with its uncertainty.

GPR interpolation of HOSVD coefficients along the Re axis

Combining the interpolated coefficients with the core tensor and rescaling gives the reconstructed field for the unseen condition. The bar chart below reports the relative $L_2$ reconstruction error per species for the test case Re = 13,000, $X_{\mathrm{H_2}}$ = 8%, together with the overall mean error $\bar{e}$.

Reconstruction error per species for Re=13000, mf=0.08

Contributors

Isacco Faglioni

References

[1] Aversano, et al., Digital twin of a combustion furnace operating in flameless conditions: reduced-order model development from CFD simulations, Proc. Combust. Inst. 38(4):5373–5381, 2021.

[2] Procacci, et al., Stochastic reduced-order modeling for the forecast of noisy dynamical systems, Proc. Combust. Inst. 41:105981, 2025.

[3] Bergmann, Meier, Wolff, and Stricker, Application of spontaneous Raman and Rayleigh scattering and 2D LIF for the characterization of a turbulent CH₄/H₂/N₂ jet diffusion flame, Appl. Phys. B: Lasers Opt. 66:489–502, 1998. DOI: 10.1007/s003400050424.

[4] De Lathauwer, et al., A multilinear singular value decomposition, SIAM J. Matrix Anal. Appl. 21(4):1253–1278, 2000.

[5] Williams, C. K. I., Rasmussen, C. E., Gaussian processes for regression, Adv. Neural Inf. Process. Syst. 8, 1995.

[6] Barragán, et al., HOSVD-SR: A Physics-Based Deep Learning Framework for Super-Resolution in Fluid Dynamics, arXiv:2504.17994, 2025.

[7] Sengupta, et al., Hybrid machine learning models based on physical patterns to accelerate CFD simulations: a short guide on autoregressive models, arXiv:2504.06774, 2025.

[8] Isacco et al., Cypher 3rd Meeting, Istanbul, Turkey, Poster-Oral presentation, 2026.

[9] Isacco et al., Princeton-Combustion Institute Summer School on Combustion and the Environment, Princeton, United States, Poster presentation, 2026.