Prediction Assessment

Intended usage

This dashboard is intended for internal use only.

If you have questions about metrics that can be used for external communication, please contact Data Science for the latest messaging.

Prediction assessment process

Basic

For each recommendation metric
Iterate over all customer tenants
For each tenant, randomly select 25% of completed campaigns
Withhold all offers within these campaigns and any identical offers in other campaigns from the training data. Rebuild all prediction models used by the current tenant
Use the rebuilt models to predict the relative results for the withheld campaigns
Measure pairwise accuracy for all withheld campaign: any pair of offers from the same campaign that is directionally correct (ex. offers A and B were both tested in the same campaign. Offer A was predicted to perform better than offer B and offer A performed better than offer B) gets counted as one correct prediction. Any pair of offers that is directionally incorrect (ex. offer A was predicted to perform better than offer B, but offer B performed better than offer A) gets counted as one incorrect prediction.

Cold start

For each recommendation metric
Iterate over all customer tenants
Turn on data sharing settings allowing all tenants to share data (not persisted and only impacts this analysis), subject to category restrictions
Withhold all completed campaigns from the current tenant. Rebuild all prediction models used by the current tenant
Use the rebuilt models to predict the relative results for all withheld campaigns
Measure the pairwise accuracy for all withheld campaigns

The distinction between cases where error is known and where error is approximated

For some recommendation metrics, the error of each measured value is known. For pairs of promotions, this means that it is possible to calculate expected accuracy. In essence, based on the values measured for both offers, what is the probability that offer A would beat offer B (or the other way around) if this experiment were repeated. This expected accuracy value is important because it is the upper bound for the prediction accuracy. Increasing the measured accuracy requires gathering more sample or making test platform improvements.

For other recommendation metrics, the error of each measured value is unknown (ex. a retailer platform may report only rolled up numbers without any accompanying error). For these cases, error is approximated for modeling purposes, but it is not possible to report an accurate measurement accuracy.

Inputs

Platforms: restricts output by platform
Tenants: restricts output by tenant
Start Date: restricts output such that it is only based on campaigns whose end date is greater than or equal to the entered date
End Date: restrict output such that is is only based on campaigns whose end data is less than or equal to the entered date
Cold Start: if selected, limits output to cold start results. If not selected, limits results to basic results. See 'Prediction assessment process' section above.
Recommendation Metric: determines which recommendation metric to report on
Group: controls the level at which results are aggregated

Latest Results section

Aggregates data from the last time each campaign was assessed.

Overall Accuracy Over time section

Plots the values for from each time that prediction assessment was run. Note that these values may fluctuate as different sets of campaigns are chosen each time that prediction assessment is run.

Detailed Results section

Contains exportable data from each time that a particular campaign is assessed.

Feedback and questions

Please contact Engineering or Data Science if you have any questions or feedback about this dashboard or the overall prediction process.

Prediction Assessment

Latest Results

Overall Accuracy Over Time

Detailed Results