Visual Optimization for Programmers

Optimization — finding the peak or valley of a curve where the slope is zero

Plain English first

Optimization means finding the best answer: the highest point, the lowest cost, the fastest path, the most efficient setting.

On a smooth hill, the very top is easy to find — it's where the ground flattens out. You can't go higher in any direction. That flat spot is a maximum.

A valley works the same way in reverse. The very bottom is where the ground stops falling and hasn't started rising yet. That's a minimum.

The key insight: at a maximum or minimum, the slope is zero.

The picture

f(x)
  |       *
  |      * *    ← slope = 0 at the peak (maximum)
  |     *   *
  |    *     *
  |  **       **
  |**           **
  |________________ x

Standard math notation

At a maximum or minimum:  f'(x) = 0

Where f'(x) is the derivative — the slope of f at point x.

Verbose Python with descriptive names

Brute-force search (no calculus needed)

def find_best_input_by_searching(
    function_to_optimize,
    search_start,
    search_end,
    number_of_search_steps,
    find_maximum=True
):
    """
    Find the input value that produces the best (highest or lowest)
    output by checking many evenly-spaced candidate inputs.

    This is the programmer's approach: try everything, keep the best.
    Works when you don't have a formula for the derivative.
    """
    step_size = (search_end - search_start) / number_of_search_steps

    best_input_so_far  = None
    best_output_so_far = None

    for step_index in range(number_of_search_steps + 1):
        candidate_input  = search_start + step_index * step_size
        candidate_output = function_to_optimize(candidate_input)

        if best_output_so_far is None:
            best_input_so_far  = candidate_input
            best_output_so_far = candidate_output

        elif find_maximum and candidate_output > best_output_so_far:
            best_input_so_far  = candidate_input
            best_output_so_far = candidate_output

        elif not find_maximum and candidate_output < best_output_so_far:
            best_input_so_far  = candidate_input
            best_output_so_far = candidate_output

    return best_input_so_far, best_output_so_far

Gradient descent (smarter, widely used in machine learning)

def find_minimum_by_walking_downhill(
    function_to_minimize,
    starting_input,
    learning_rate,
    number_of_steps
):
    """
    Start somewhere. Check whether moving left or right decreases the output.
    Step in the direction that goes downhill. Repeat.

    This is gradient descent — the engine behind most neural network training.
    """
    current_input = starting_input
    tiny_step     = 0.000001  # for estimating slope numerically

    for iteration in range(number_of_steps):
        # Estimate slope at current position
        output_here        = function_to_minimize(current_input)
        output_one_step_right = function_to_minimize(current_input + tiny_step)
        slope_at_current   = (output_one_step_right - output_here) / tiny_step

        # Move opposite to the slope (downhill)
        current_input = current_input - learning_rate * slope_at_current

    return current_input

Real uses

Game AI: find the best move, the most efficient path
Machine learning: minimize prediction error (loss function)
Engineering: maximize strength, minimize material cost
Finance: minimize risk for a given return

Common mistakes

Stopping at a local minimum (a valley in one area) instead of the global minimum (the deepest valley anywhere). Brute-force avoids this; gradient descent can get stuck.
Setting the learning rate too high — the algorithm overshoots and bounces around instead of converging.
Forgetting that f'(x) = 0 finds both maxima and minima (and saddle points). You have to check which one you found.

Results