Extreme values of functions with constraints

Extreme values when the constraints are equations
Extreme values when the constraints are inequalities

Extreme values when the constraints are equations

We search the extreme values of a function \(f(x_1, \dots, x_n)\) with the constraints

\[\begin{align} g_1(x_1, \dots, x_n) &= 0 \\ g_2(x_1, \dots, x_n) &= 0 \tag{1} \\ \dots \\ g_m(x_1, \dots, x_n) &= 0 \end{align}\]

It is required that the number of constraints is lower than the number of variables, i.e \(m \lt n\).

We construct the Lagrange function

\[ L\left( \vec{x}, \vec{\lambda} \right) = f \left( \vec{x} \right) + \lambda_1 \cdot g_1 \left( \vec{x} \right) + \lambda_2 \cdot g_2 \left( \vec{x} \right) + \dots + \lambda_m \cdot g_m \left( \vec{x} \right) \]

and search for the point were all \(m + n\) partial derivatives of \(L\left( \vec{x}, \vec{\lambda} \right)\) vanish:

\[\begin{align} \frac{\partial L}{\partial x_1} \left( \vec{x}, \vec{\lambda} \right) &= 0 \\ \frac{\partial L}{\partial x_2} \left( \vec{x}, \vec{\lambda} \right) &= 0 \\ \dots \\ \frac{\partial L}{\partial x_n} \left( \vec{x}, \vec{\lambda} \right) &= 0 \tag{2} \\ \\ \frac{\partial L}{\partial \lambda_1} \left( \vec{x}, \vec{\lambda} \right) &= 0 \\ \frac{\partial L}{\partial \lambda_2} \left( \vec{x}, \vec{\lambda} \right) &= 0 \\ \dots \\ \frac{\partial L}{\partial \lambda_m} \left( \vec{x}, \vec{\lambda} \right) &= 0 \end{align}\]

Note that the derivatives with respect to the \(\lambda_i\) are identical with the constraints. In order to find out if an extreme value is a minimum or a maximum, the Hessian matrix - a matrix containing all second order partial derivatives of \(f(\vec{x})\) - has to be analyzed, see for instance this Wikipedia entry or read the next paragraph.

Example 1

We want to find the extreme values of \(f(x, y) = x^2 + y^2\) with the constraint \(x + y = 1\).

The constraint in the correct form is \(g_1(x, y) = x + y - 1 = 0\). The Lagrange function is then

\[ L (x, y, \lambda_1) = f(x, y) + \lambda_1 \cdot g_1(x, y) = x^2 + y^2 + \lambda_1 \cdot \left( x + y - 1 \right) \]

The partial derivatives with respect to \(x\), \(y\), and \(\lambda_1\) read:

\[\begin{align} \frac{\partial L}{\partial x} \left( x, y, \lambda_1 \right) &= 2 \cdot x + \lambda_1 = 0 \\ \frac{\partial L}{\partial y} \left( x, y, \lambda_1 \right) &= 2 \cdot y + \lambda_1 = 0 \tag{3} \\ \frac{\partial L}{\partial \lambda_1} \left( x, y, \lambda_1 \right) &= x + y - 1 = 0 \end{align}\]

The first two equations deliver \(x = y\), the third equation then yields \(2 \cdot x - 1 = 0\), i.e. \(x = y = 0.5\) with a function value of \(f(0.5, 0.5) = 0.5\).

Here comes an interactive plot of the function \(f(x,y) = x^2 + y^2\) (gray), the constraint (green), and the identified extreme value (red). The plot can be rotated using the left mouse button and can be enlarged using the mouse wheel (might not work with older versions of browsers):

Minimum or maximum ?

In order to find out if the detected extreme values \(\vec{x_0}\) are minima or maxima without plotting the graph, it is convenient to look at the eigenvalues of the Hessian. The following rules apply:

The function has a local minimum at \(\vec{x_0}\) if all eigenvalues of the Hessian are positive.
The function has a local maximum at \(\vec{x_0}\) if all eigenvalues of the Hessian are negative.
The function has a saddle point at \(\vec{x_0}\) if the Hessian has both positive and and negative eigenvalues.

Read more about the Hessian here, about definiteness here, and about saddle points here.

In the two-dimensional case, the Hessian is:

\[H = \begin{pmatrix} \dfrac{\partial^2 f}{\partial x^2} & \dfrac{\partial^2 f}{\partial x \partial y}\\ \dfrac{\partial^2 f}{\partial y \partial x} & \dfrac{\partial^2 f}{\partial y^2} \end{pmatrix} \]

This is a symmetric matrix because \(\dfrac{\partial^2 f}{\partial x \partial y} = \dfrac{\partial^2 f}{\partial y \partial x}\).

For example 1, we get:

\[H = \begin{pmatrix} 2 & 0\\ 0 & 2 \end{pmatrix} \]

(In this simple case, the Hessian does not depend on the coordinates. In general, the Hessian has to be calculated for the coordinates identified as extreme value, i.e. at at point \(\vec{x_0}\).)

The eigenvalues \(\alpha\) of \(\textbf{H}\) can be calculated using the characteristic equation \(\mid \textbf{H} - \alpha \, \textbf{I} \mid = 0\) where \(I\) is the unity matrix. For example 1, this yields:

\[ \begin{vmatrix} 2 - \alpha & 0\\ 0 & 2 - \alpha \end{vmatrix} = 0 \]

i.e. \(\left( 2 - \alpha\right)^2 = 0\) with the solution \(\alpha = 2\), proofing that \((x,y) = (0.5,0.5)\) is a minimum.

Example 2

We are looking for extreme values of \(f(x, y) = 10 \cdot \sin{(r)} / r\) where \(r(x,y) = \sqrt{x^2+y^2}\). The search is to be confined to the rectangular region \(x \in (-10, 10)\) and \(y \in (-10, 10)\). The constraint is \(x = y\).

The Lagrange function is

\[ L (x, y, \lambda) = f(x, y) + \lambda \cdot g(x, y) = 10 \cdot \sin{(r)} / r + \lambda \cdot \left( x - y \right) \]

We need the partial derivatives of \(r\) with respect to \(x\) and \(y\):

\[\begin{align} \frac{\partial r}{\partial x} &= \frac{1}{2} \cdot \left( x^2 + y^2 \right)^{-1/2} \cdot 2x = \frac{x}{r} \\ \frac{\partial r}{\partial y} &= \frac{y}{r} \end{align}\]

The partial derivatives of the Lagrange function are:

\[\begin{align} \frac{\partial L}{\partial x} &= 10 \cdot \left[ (-1)r^{-2} \cdot \sin{(r)} + \frac{1}{r} \cdot \cos{(r)} \right] \cdot \frac{\partial r}{\partial x} + \lambda \\ &= 10 \cdot \left[ \cos{(r)} - \frac{1}{r} \cdot \sin{(r)}\right] \cdot \frac{x}{r^2} + \lambda = 0 \\ \frac{\partial L}{\partial y} &= 10 \cdot \left[ \cos{(r)} - \frac{1}{r} \cdot \sin{(r)}\right] \cdot \frac{y}{r^2} - \lambda = 0 \\ \frac{\partial L}{\partial \lambda} &= x - y = 0 \end{align}\]

Using \(y = x\) and adding the first and second equation gives

\[ \left[ \frac{1}{r} \cdot \cos{(r)} - \frac{1}{r^2} \cdot \sin{(r)}\right] \cdot \frac{x}{r} = 0 \]

\[ r = \tan{(r)} \]

This equation can be solved numerically, for instance by using the nleqslv library in R. A plot of \(r - \tan{(r)}\) in the region of interest helps finding proper start values (marked red):

Numerical analysis yields 3 different positive solutions for \(r\) in the rectangular region under consideration. The corresponding values for \(x\) and \(y\) can be calculated using \(x = y = \pm r/\sqrt{2}\):

\[\begin{align} x_0 &= y_0 = 0 \\ x_1 &= y_1 = -3.17732 \\ x_2 &= y_2 = +3.17732 \\ x_3 &= y_3 = -5.462578 \\ x_4 &= y_4 = +5.462578 \end{align}\]

The following plots show the function (gray), the constraint (green) and the identified solutions (interactive plot, use left mouse button to rotate and mouse wheel to zoom):

Note that we have ignored extreme values at the border of the region of interest.

Extreme values when the constraints are inequalities

Identifying extreme values of functions works in a slightly different manner when the constraints are inequalities.

Let us assume that we search the extreme values of a function \(f(x_1, \dots, x_n)\) with the following constraints:

\[\begin{align*} g_1(x_1, \dots, x_n) &\le 0 \\ g_2(x_1, \dots, x_n) &\le 0 \tag{4} \\ \dots \\ g_m(x_1, \dots, x_n) &\le 0 \end{align*}\]

It is important that we stick to the \(\le\) sign here.
Again, it is required that the number of constraints is lower than the number of variables, i.e \(m \lt n\).

The Lagrange function is the same as above:

However, we must require that the \(\lambda_i\) are non-negative when the constraints are inequalities of the form \(g(x_1, \dots, x_n) \le 0\) (non-negativity condition).

Now, we require that the \(n\) partial derivatives with respect to the variables vanish:

Solving this set of equations delivers solutions for \(\vec{x}\) which parametrically depend on the \(\lambda_i\), i.e. we obtain a relationship \(\vec{x} = h \left( \vec{\lambda} \right)\).

In addition to equations (5) for the partial derivatives of the Lagrangian and the non-negativity condition for the \(\vec{\lambda}\), the Karush-Kuhn-Tucker (KKT) conditions must be fulfilled:

\[\begin{align} \lambda_1 \cdot g_1(x_1, \dots, x_n) &= 0 \\ \lambda_2 \cdot g_2(x_1, \dots, x_n) &= 0 \tag{6} \\ \dots \\ \lambda_m \cdot g_m(x_1, \dots, x_n) &= 0 \\ \end{align}\]

That means that either the Lagrangian multiplicator \(\lambda_i\) or the function \(g_i(x_1, \dots, x_n)\) must vanish.

This might require evaluation of multiple scenarios when more than one constraint is involved. For instance, for two inequalities, possible scenarios are:

\[\begin{align} \lambda_1 = 0 \quad &\text{and} \quad \lambda_2 = 0 \\ \lambda_1 = 0 \quad &\text{and} \quad \lambda_2 \ne 0 \\ \lambda_1 \ne 0 \quad &\text{and} \quad \lambda_2 = 0 \\ \lambda_1 \ne 0 \quad &\text{and} \quad \lambda_2 \ne 0 \end{align}\]

Keep in mind that for each \(\lambda_i \ne 0\) it must follow \(g_i(x_1, \dots, x_n) = 0\) , according to eqns. (6).
For all these scenarios, it must be checked if the solution calculated from \(\vec{x} = h \left( \vec{\lambda} \right)\) is consistent with the inequality constraints.

Example 3

We want to find the extreme values of \(f(x, y) = x^2 + y^2\) with the constraint \(x + y \gt 1\).

The Lagrange function is

\[ L (x, y, \lambda_1) = f(x, y) + \lambda_1 \cdot g_1(x, y) = x^2 + y^2 + \lambda_1 \cdot \left( 1 - x - y \right) \]

Note that we have rewritten the constraint to the form \(g_1(x, y) = 1- x - y \le 0\).

The partial derivatives with respect to \(x\) and \(y\) are:

\[\begin{align} \frac{\partial L}{\partial x} \left( x, y, \lambda_1 \right) &= 2 \cdot x - \lambda_1 = 0 \tag{7} \\ \frac{\partial L}{\partial y} \left( x, y, \lambda_1 \right) &= 2 \cdot y - \lambda_1 = 0 \end{align}\]

This yields immediately \(x = y = \lambda_1/2\) , i.e. the above mentioned relationship \(\vec{x} = h(\vec{\lambda})\).

Now, we look at the Karush-Kuhn-Tucker condition:

\[ \lambda_1 \cdot g_1 \left( x, y \right) = \lambda_1 \cdot \left( 1 - x - y \right) = 0 \tag{8} \]

According to eqn. (8), either \(\lambda_1\) or \(g_1 \left( x, y \right)\) must vanish.
If \(\lambda_1 = 0\), the non-negativity condition is automatically fulfilled and we have to check for \(g_1(x,y) \le 0\), using the relationship \(\vec{x} = h(\vec{\lambda})\).
On the other hand, if \(g_1(x,y) = 0\), the constraint is automatically fulfilled and we have to check the non-negativity condition for \(\lambda_1\), again exploiting \(\vec{x} = h(\vec{\lambda_1})\), more precisely the inverse relationship \(\lambda_1 = h^{-1}(\vec{x})\).

Case 1: \(\lambda_1 = 0\)

Using this choice, the non-negativity condition for the Lagrangian multiplicator is trivially fulfilled. It remains to check if the constraints for the variables are fulfilled, having this choice of \(\lambda_1\) and the relationship \(\vec{x} = h(\vec{\lambda_1})\).
We have \(x = y = \lambda_1/2 = 0\) for case 1. Inserting \((x,y) = (0,0)\) into the constraint gives \(1 - x - y = 1 \nleq 0\), i.e. the constraint is violated. We therefore reject the solution \(x = y = 0\).

Case 2: \(g_1 \left( x, y \right) = 0\)

The expression \(g_1(x,y) = 1 - x - y = 0\) automatically satisfies the constraint. Furthermore, we know from (7) that \(x = y\). Both equations yield \(x = y = 1/2\). With \(\lambda_1 = 2 \cdot x = 1 \ge 0\) the non-negativity condition is also fulfilled. That leads to the final result that \(x = y = 1/2\) is the only valid solution for the optimization problem.
We also note that the constraint has had some effect here because the solution without constraint would have been \(x = y = 0\) following from the requirement that the partial derivatives of \(f(x,y)\) vanish.

Example 4

It might be interesting to turn the constraint around. Let’s optimize

\(f(x, y) = x^2 + y^2\) with the constraint \(x + y \le 1\).

The Lagrange function changes slightly:

\[ L (x, y, \lambda_1) = f(x, y) + \lambda_1 \cdot g_1(x, y) = x^2 + y^2 + \lambda_1 \cdot \left( x + y - 1 \right) \]

The partial derivatives with respect to \(x\) and \(y\) are now:

\[\begin{align} \frac{\partial L}{\partial x} \left( x, y, \lambda_1 \right) &= 2 \cdot x + \lambda_1 = 0 \tag{9} \\ \frac{\partial L}{\partial y} \left( x, y, \lambda_1 \right) &= 2 \cdot y + \lambda_1 = 0 \end{align}\]

We obtain \(x = y = -\lambda_1/2\) from eqns. (9).

The KKT condition is:

\[ \lambda_1 \cdot g_1 \left( x, y \right) = \lambda_1 \cdot \left( x + y - 1 \right) = 0 \tag{10} \]

Case 1: \(\lambda_1 = 0\)

The non-negativity condition is trivially fulfilled. We have \(x = y = -\lambda_1/2 = 0\) for case 1. Inserting this potential solution \(\vec{x}\) into the constraint gives \(g_1(x,y) = x + y - 1 = -1 \le 0\), i.e. the constraint is satisfied. We therefore accept the solution \(x = y = 0\).

Case 2: \(g_1 \left( x, y \right) = 0\)

The expression \(g_1(x,y) = x + y - 1 = 0\) trivially satisfies the constraint. We know from (9) that \(x = y\). Both equations yield \(x = y = 1/2\). But \(\lambda_1 = - 2 \cdot x = -1 \lt 0\) violates the non-negativity condition, so that this solution must be rejected.
Consequently, the final result is that \(x = y = 0\) is the only valid solution for this problem. This is the same solution that we would get without constraint, i.e. the constraint has no effect here.

Example 5

Let us investigate the function from example 2 with an inequality constraint.
We are looking for extreme values of \(f(x, y) = 10 \cdot \sin{(r)} / r\), where \(r(x,y) = \sqrt{x^2+y^2}\).
The search is to be confined to the rectangular region \(x \in (-10, 10)\) and \(y \in (-10, 10)\). Let the constraint be \(x + y \le -2\).

The Lagrange function is:

\[ L (x, y, \lambda) = f(x, y) + \lambda \cdot g(x, y) = 10 \cdot \frac{1}{r} \cdot \sin{(r)} + \lambda \cdot \left( x + y + 2 \right) \]

The partial derivatives of the Lagrangian with respect to the variables must vanish:

\[\begin{align} \frac{\partial L}{\partial x} &= 10 \cdot \left[ (-1)r^{-2} \cdot \sin{(r)} + \frac{1}{r} \cdot \cos{(r)} \right] \cdot \frac{\partial r}{\partial x} + \lambda \\ &= 10 \cdot \left[ \cos{(r)} - \frac{1}{r} \cdot \sin{(r)}\right] \cdot \frac{x}{r^2} + \lambda = 0 \tag{11} \\ \frac{\partial L}{\partial y} &= 10 \cdot \left[ \cos{(r)} - \frac{1}{r} \cdot \sin{(r)}\right] \cdot \frac{y}{r^2} + \lambda = 0 \end{align}\]

The KKT condition reads:

\[ \lambda \cdot \left( x + y + 2\right) = 0 \tag{12} \]

Case 1: \(\lambda = 0\)

In this case, we have no \(\lambda\) in eqns. (11), and the solutions follow from \(r = \tan{r}\), i.e. we have infinitely many candidate extreme values, namely the point \((x,y) = (0,0)\) and the circles with radii \(r_1 \approx 4.493409\) and \(r_2 \approx 7.725252\) (using the numerical results obtained for example 2). Note that extreme values on the edges of the region of interest are ignored for simplicity.
The point \((x,y)=(0,0)\) can immediately be excluded from the set of candidates because it violates the constraint.
Now, we know that the constraint is only fulfilled for points on one side of the plane \(x + y + 2 = 0\). It is therefore useful to find the intersection points of this plane with the circles with radii \(r_1\) and \(r_2\).
In order to find the \((x,y)\) coordinates of the intersection of that plane with circle of radius \(r_i\), we solve:

\[\begin{align} x + y +2 &= 0 \tag{12} \\ x^2 + y^2 &= r_i^2 \end{align}\]

This yields:

\[\begin{align} (x_{1,i}, y_{1,i}) &= (d_i-1, -d_i-1) \tag{13} \\ (x_{2,i}, y_{2,i}) &= (-d_i-1, d_i-1) \end{align}\]

where \(d_i = \sqrt{\frac{r_i^2}{2}-1}\).

The drawing shows the intersection points in the \(x,y\)-plane:

The points on the two circles below the line \(y = -x-2\) satisfy the constraint and are therewith extreme values of the function with the constraint \(x + y \le -2\).

Case 2: \(g(x,y) = \left( x + y + 2\right) = 0\)

Inserting \(y = -x -2\) into eqns. (11) yields:

\[ \frac{x}{r^2} = \frac{-x-2}{r^2} \tag{14} \]

(Note that the term in the square bracket in (11) cannot be zero because \(\lambda\) is assumed to be different from zero.)

Eqn. (14) leads to the only solution \(x = y = -1\), and therewith to \(r=\sqrt{(-1)^2 + (-1)^2} = \sqrt{2}\).

We check now if the non-negativity condition for \(\lambda\) is satisfied using the relationship \(\vec{\lambda} = h^{-1}(\vec{x})\), derived from eqns. (11):

\[\begin{align} \lambda &= -\frac{x}{r^2} \cdot 10\cdot \left[ \cos{(r)} - \frac{1}{r} \cdot \sin{(r)} \right] \\ &= -\frac{-1}{2} \cdot 10\cdot \left[ \cos{(\sqrt{2})} - \frac{1}{\sqrt{2}} \cdot \sin{(\sqrt{2})} \right] \tag{15} \\ &\approx 5 \cdot \left[ 0.156 - \frac{1}{\sqrt{2}} \cdot 0.988 \right] = 5 \cdot \left[ 0.156-0.698 \right] \lt 0 \end{align}\]

We conclude that this solution has to be rejected because the non-negativity condition is violated.

Here comes an interactive plot of the function (gray), the constraint (green), and the solution (red). The plot can be rotated using the left mouse button and enlarged using the mouse wheel (might not work on older versions of browsers):

Mail: Uwe Menzel

Website: matstat.org