Custom Objective Functions¶
Often times, the objective function is very specific to the use-case or business problem. To get the right objective to optimize requires thinking through the decisions or actions that will be taken using the model and assigning a cost/benefit to doing that correctly or incorrectly based on known outcomes in the training data.
Once you have determined the objective for your business, you can provide that to EvalML to optimize by defining a custom objective function.
How to Create a Objective Function¶
To create a custom objective function, we must define 2 functions
The “objective function”: this function takes the predictions, true labels, and any other information about the future and returns a score of how well the model performed.
The “decision function”: this function takes prediction probabilities that were output from the model and a threshold and returns a prediction.
To evaluate a particular model, EvalML automatically finds the best threshold to pass to the decision function to generate predictions and then scores the resulting predictions using the objective function. The score from the objective function determines which set of pipeline hyperparameters EvalML will try next.
To give a concrete example, let’s look at how the fraud detection objective function is built.
[1]:
from evalml.objectives.objective_base import ObjectiveBase
class FraudCost(ObjectiveBase):
"""Score the percentage of money lost of the total transaction amount process due to fraud"""
name = "Fraud Cost"
needs_fitting = True
greater_is_better = False
uses_extra_columns = True
fit_needs_proba = True
score_needs_proba = False
def __init__(self, retry_percentage=.5, interchange_fee=.02,
fraud_payout_percentage=1.0, amount_col='amount', verbose=False):
"""Create instance of FraudCost
Args:
retry_percentage (float): what percentage of customers will retry a transaction if it
is declined? Between 0 and 1. Defaults to .5
interchange_fee (float): how much of each successful transaction do you collect?
Between 0 and 1. Defaults to .02
fraud_payout_percentage (float): how percentage of fraud will you be unable to collect.
Between 0 and 1. Defaults to 1.0
amount_col (str): name of column in data that contains the amount. defaults to "amount"
"""
self.retry_percentage = retry_percentage
self.interchange_fee = interchange_fee
self.fraud_payout_percentage = fraud_payout_percentage
self.amount_col = amount_col
super().__init__(verbose=verbose)
def decision_function(self, y_predicted, extra_cols, threshold):
"""Determine if transaction is fraud given predicted probabilities,
dataframe with transaction amount, and threshold"""
transformed_probs = (y_predicted * extra_cols[self.amount_col])
return transformed_probs > threshold
def objective_function(self, y_predicted, y_true, extra_cols):
"""Calculate amount lost to fraud given predictions, true values, and dataframe
with transaction amount"""
# extract transaction using the amount columns in users data
transaction_amount = extra_cols[self.amount_col]
# amount paid if transaction is fraud
fraud_cost = transaction_amount * self.fraud_payout_percentage
# money made from interchange fees on transaction
interchange_cost = transaction_amount * (1 - self.retry_percentage) * self.interchange_fee
# calculate cost of missing fraudulent transactions
false_negatives = (y_true & ~y_predicted) * fraud_cost
# calculate money lost from fees
false_positives = (~y_true & y_predicted) * interchange_cost
loss = false_negatives.sum() + false_positives.sum()
loss_per_total_processed = loss / transaction_amount.sum()
return loss_per_total_processed