Simple Curve Fitting with the Gauss-Newton Algorithm (2024)

$\newcommand{\mvec}[1]{\mathbf{#1}}\newcommand{\gvec}[1]{\boldsymbol{#1}}\definecolor{eqcol2}{RGB}{114,0,172}\definecolor{eqcol1}{RGB}{172,0,114}\definecolor{eqcol3}{RGB}{0,172,114}\definecolor{eqcol4}{RGB}{230,190,120}$In the description of the physically based rendering implementation of Unreal Engine 4[1], Karis states that instead of using the classical Fresnel formula \begin{align*} F(\mvec{v},\mvec{h}) = F_0 + (1-F_0)(1 - \mvec{v}\cdot\mvec{h})^5 \end{align*} he is using the approximation \begin{align*} F(\mvec{v},\mvec{h}) = F_0 + (1-F_0)2^{ (−5.55473(\mvec{v}\cdot\mvec{h}) −6.98316)(\mvec{v}\cdot\mvec{h}) } \end{align*} Note that $-5.55473$ and $−6.98316$ are oddly exact numbers. It is unkely that Karis found them by guessing. Instead, it is more likely that they were found by some numerical algorithm. In this article, we shall describe one method that could have been used for finding these constants.

First, note that $\mvec{v}$ and $\mvec{h}$ are in practice always normalized before they are used in the Fresnel formula. But then we must have $0 \le \mvec{v}\cdot\mvec{h} \le 1$, by the definition of the dot product. Consequently, if we want to find an approximation for $(1 - \mvec{v}\cdot\mvec{h})^5$, it is sufficient to find an approximation for $(1 - x)^5$, where $0 \le x \le 1$, since this is the only interval that is of interest when we are using the Fresnel formula. Within this interval, Karis finds the approximation $2^{ (−5.55473x −6.98316)x}$. Indeed, plotting reveals that this approximation is only useful within the interval $0 \le x \le 1$:

The Gauss-Newton Algorithm

As a simple example, let us say we have a function $y(x)$ we want toapproximate by some approximation $\hat{y}(x,p)$. Here $p$ is theparameter of the approximation. $\hat{y}$ could be $px^2$ or $px$ forinstance, and we require a value of $p$ that result in $\hat{y}$ having alow RMSE. We start with an initial guess for $p$, and then update $p$by adding some update step $h$ to it. Ideally, adding $h$ to $p$should make our approximation move closer to $y$. Now, the main issue isdetermining a good value for $h$. As a first step, we approximate thevalue of $\hat{y}(x,p+h)$ by a first order taylor series at $p$\begin{align*}\hat{y}(x,p+h) \approx \hat{y}(x,p) + (p+h-p)\frac{\partial\hat{y}}{\partial p}(x,p) = \hat{y}(x,p) + h\frac{\partial \hat{y}}{\partial p}(x,p)\end{align*}Define $S$ as the sum of the squared differences between $y$ and $\hat{y}$\begin{align*}S = \sum_{i=1}^N (y(x_i) - \hat{y}(x_i,p))^2\end{align*}If $S$ is small, then $\hat{y}$ is a good approximation(we are using$S$ instead of RMSE, since it is easier to deal withalgebraically). Our chief objective is to find a value of $h$ thatmoves $\hat{y}$ closer to $y$. To achieve this, we substitute ourTaylor approximation for $\hat{y}(h,p+h)$ into the expression for $S$\begin{align*}S = \sum_{i=1}^N (y(x_i) - (\hat{y}(x_i,p) + h\frac{\partial\hat{y}}{\partial p}(x_i,p)))^2 = \sum_{i=1}^N (y(x_i) - \hat{y}(x_i,p) - h\frac{\partial\hat{y}}{\partial p}(x_i,p))^2\end{align*}We will find a value for $h$ that minimizes $S$, and this will be ourupdate step. For this purpose, the partial derivative of $S$ withrespect to $h$ is calculated\begin{align*}\frac{\partial S}{\partial h} = -2\sum_{i=1}^N \left[ \left(y(x_i) -\hat{y}(x_i,p) - h\frac{\partial \hat{y}}{\partialp}(x_i,p)\right)\frac{\partial \hat{y}}{\partial p}(x_i,p) \right]\end{align*}the value of $h$ for which $\frac{\partial S}{\partial h} = 0$, iseither a local minimum or maximum, or a saddle point, depending onthe sign of the second derivative:\begin{align*}\frac{\partial^2 S}{\partial h^2} = +2\sum_{i=1}^N \left[ \left(\frac{\partial \hat{y}}{\partial p}(x_i,p)\right)^2 \right]\end{align*}So we clearly have $\frac{\partial^2 S}{\partial h^2} \ge 0$, since asquared number must be positive(since we are not dealing with anycomplex numbers here!). If $\frac{\partial^2 S}{\partial h^2} > 0$ forsome value of $h$, then for sure that value of $h$ is a good value forour purposes, as this value of $h$ is at a local minimum. However, itcould occur that the value of $\left(\frac{\partial \hat{y}}{\partialp}(x_i,p)\right)^2$ is zero at all sample points, and then$\frac{\partial^2 S}{\partial h^2} = 0$, meaning that $h$ is at asaddle point, and not a local minimum. However, this is a veryunlikely event, and we found that even if we ignored this possibility,good results were obtained in the end.

Therefore, to find a good value for $h$, we set the first partialderivative to zero and solve for $h$, in order to find the value of$h$ for which we have a local minimum:\begin{align*}-2\sum_{i=1}^N \left[ \left(y(x_i) -\hat{y}(x_i,p) - h\frac{\partial \hat{y}}{\partialp}(x_i,p)\right)\frac{\partial \hat{y}}{\partial p}(x_i,p) \right] = 0 \\\end{align*}Rearranging the above yields\begin{equation}\left( \sum_{i=1}^N \left(\frac{\partial \hat{y}}{\partial p}(x_i,p) \right)^2\right)h = \sum_{i=1}^N \left( \left( y(x_i) - \hat{y}(x_i,p) \right) \frac{\partial \hat{y}}{\partialp}(x_i,p) \right) \label{eq:single}\end{equation}Therefore, to compute a value of $h$ that will move $\hat{y}(x,p)$ closer to$y(x)$, we simply solve for $h$ in equation $\eqref{eq:single}$. Thenew value of $p$ shall be set to $p+h$. Then $h$ will be solved foragain, and $p$ will be updated again, and this iterative scheme isrepeated until the RMSE is small enough. This algorithm is called theGauss-Newton Algorithm.

In practice, $\hat{y}$ will often be dependent on several $m$ parameters and not a single parameter, so that $p$ is instead a vector $\mvec{p}$ of dimensions $m \times 1$, so that our approximation is instead written $\hat{y}(x, \mvec{p})$. The update step is also a vector $\mvec{h}$ of dimensions $m \times 1$. For every iteration, we will find our update step by solving the matrix equation \begin{equation} [\mvec{J}^T \mvec{J}] \mvec{h} = \mvec{J}^T (\mvec{y} - \mvec{\hat{y}}) \label{eq:multi} \end{equation} The jacobian matrix $\mvec{J}$ is a matrix with dimensions $n \times m$. It is defined as follows: In column $j$ in row $i$, we store the value $\frac{\partial \hat{y}}{\partial p_j}(x_i, \mvec{p})$. So the jacobian simply stores the values of the partial derivatives for all the sample points. Finally, the vectors $\mvec{y}$ and $\mvec{\hat{y}}$ contains the values of $y(x)$ and $\hat{y}(x,\mvec{p})$ evaluated at all the sample points. Finally, note that $\eqref{eq:multi}$ is simply a generalization of $\eqref{eq:single}$. The derivation of this more general equation is slightly more involved, since it involves matrices and vectors, but the derivation is still based on the exact same principles that we applied when deriving equation $\eqref{eq:single}$. The interested reader may peruse section 3 in [2] for a derivation of $\eqref{eq:multi}$.

Gauss-Newton for Curve Fitting

With the Gauss-Newton algorithm in our hands, we can focus ourattention on the initial problem: finding values of the parameters $A$and $B$ such that $2^{(Ax + B)x}$ approximates $(1-x)^5$ with a smallRMSE. We denote this approximation $\hat{y}(x,A,B) = 2^{(Ax + B)x}$. Inorder to utilize the Gauss-Newton algorithm, the partial derivativesmust be found:\begin{align*}\frac{\partial\hat{y}}{\partial A}(x,A,B) &= x^2\ln(2) \cdot 2^{(Ax + B)x}\\\frac{\partial\hat{y}}{\partial B}(x,A,B) &= x\ln(2) \cdot 2^{(Ax + B)x}\end{align*}These two partial derivatives are used to form the linear system$\eqref{eq:multi}$. Once this system has been formed, it can be solvedby using Eigen. By solvingthe system, the update step $\mvec{h}$ is obtained, and this value is addedto $A$ and $B$ to improve the RMSE. By running the Gauss-Newtonalgorithm for a number of iterations, it is found that the parametervalues $A=-5.55473$ and $B=6.98316$(these values are rounded) minimize the RSME. These are alsoexactly the values that Karis used, and now we finally know how Karisobtained those two values in the first place: probably by using theGauss-Newton algorithm, or some similar algorithm.

Source code that implements the Gauss-Newton algorithm to find the parameters $A=-5.55473$ and $B=6.98316$ can be found in a gist.

References

[1] Brian Karis, "Real Shading in Unreal Engine 4". Link.

[2] Henri P. Gavin, "The Levenberg-Marquardt method fornonlinear least squares curve-fitting problems". Link.