
Discover Quantile Regression
September-09-2024
Regression models usually provide a single predicted value, which isn’t always the most helpful.
For example, imagine a model that collects data from various job roles and predicts the expected salary based on factors like job title, years of experience, and education level. The graph might look something like this:

A traditional regression model will give a single salary prediction based on the input factors.
But a single value, like $80k, isn’t very helpful, is it?
What would be more useful is getting a range or different quantiles, which can give you a better idea of the best-case and worst-case scenarios.

- 25th percentile → $65k. This means that 25% of employees in similar roles earn $65k or less.
- 50th percentile (the median) → $80k. This represents the middle point where half of the employees earn less and half earn more.
- 75th percentile → $95k. This means that 25% of employees earn $95k or more.
Using this approach makes sense because there’s always a range of values for the target variable. A single point estimate, like the mean, doesn’t fully capture this range.

Quantile regression solves this.
What is Quantile Regression?
As the name implies, quantile regression estimates different quantiles of the response variable based on the input data.
While ordinary least squares (OLS) or linear regression predicts the average value of the dependent variable for given inputs, quantile regression can provide estimates for various quantiles, such as the 25th, 50th, and 75th percentiles.

How does it work?
The concept is quite simple and easy to understand.
Take a look at the example dataset below and the linear regression line fitted to it:

Based on this regression fit:
- Green points have a positive error (true value minus predicted value).
- Red points have a negative error (true value minus predicted value).
Here’s a trick we can use:
- To create the 75th percentile line (or any percentile above 50%), we can give more weight to the green points. This will shift the prediction line closer to the green points.

- Similarly, to create the 25th percentile line (or any percentile below 50%), we can give more weight to the red points. This will move the prediction line closer to the red points.

In other words, the standard error term in linear regression is shown in the graph below:

In the plot above, predictions that are equally far from the actual value are given the same loss value.
However, we can adjust this by using a parameter "w" in the loss function, so that the loss value can differ depending on whether the prediction is above or below the actual value.

If \( w > 0.5 \), we get the plot on the left; if \( w < 0.5 \), we get the plot on the right:

As a result, we can train several regression models, each corresponding to a different quantile parameter \( w \), to create a set of quantile regression models.
Specifically:
- For the 75th percentile model, train with ( w = 0.75 ).
- For the 50th percentile model (the median), train with ( w = 0.50 ).
- For the 25th percentile model, train with ( w = 0.25 ).
Once trained, during inference, you pass the input through each quantile-specific model to get predictions for each quantile level.

Simple, isn’t it?
Implementation from scratch
Look at the dummy dataset below, along with the fit from the ordinary least squares (OLS) regression:

As mentioned earlier, the approach involves training several regression models, each for a different quantile we want to predict.
The loss for each model will be calculated using the following function:

Here’s a function that calculates this using a specific weight parameter \( w \) and model weights (θ):
def find_loss(initial_weights, w):
# current prediction
prediction = initial_weights[0] * X + initial_weights[1]
# error
error = Y - prediction
# reweigh error term
weighted_error = np.where(error > 0,
w * np.abs(error), # if true
(1 - w) * np.abs(error)) # if false
# total loss
return weighted_error.sum()
To find the optimal weights, we will use the `minimize` method from Scipy, as shown below:
from scipy.optimize import minimize
initial_weights = np.array([0, 1])
def get_quantile_model(w):
model_weights = minimize(find_loss, initial_weights, args=(w)).x
return model_weights
In the code above, `minimize()` is a function from the `scipy.optimize` module that performs optimization. It returns the parameter values that minimize the objective function (in this case, `find_loss`).
Once you run the function for 5 different values of the parameter \( w \), you get the following plots:

It’s clear that as the value of w increases, the line shifts upwards, reflecting higher quantiles.
In my experience, quantile regression models generally perform well with tree-based regression methods.
In fact, models like LightGBM regression inherently support quantile loss functions.