This article includes examples of how to set up alerts for:
Volumes
Conversion Rates
Averages
Failure Rates
Distribution Changes
ALERTS ON VOLUMES
For example on revenue, number of items purchased, visitors, sessions, page views, impressions, etc:
- The interesting direction is usually Down: Drop in revenue, visitors, sessions, etc.
- Absolute Min delta should be set based on some average volume for the metrics; e.g. if the metrics are revenue per product - the min delta should keep the most revenue generating products
- The influencing metrics in this case would typically be the same metric - filtering on the volume BEFORE the anomaly (usually at a higher time scale); e.g. Alert on drop in hourly revenue of a product only if the revenue in the DAY before the anomaly was higher than $100. This filters products with daily revenue smaller than $100 per day.
- Time scale - one, multiple
ALERTS ON CONVERSION RATES
For example on click through rate, conversion rate of shopping carts, Goal completion rates, Game completions, etc. Here the business is measuring the completion rates of a process.
Note that it is usually not as powerful to set up alerts on the VOLUME of conversions, but rather the rates. Anomalous changes in the rates are the more interesting because the VOLUME may increase or decrease together with the metrics that start the process (e.g., add to cart volume, game start, impressions, etc).
- The typical use case is to identify drops in conversion rates (DOWN anomalies).
- Alerts on UP anomalies could be indications of fraud (or glitches) - it maybe be better to split up the two types of anomalies to two alerts (one for DOWN and one for UP) since UP anomalies probably have a different sensitivity.
- Min delta should be used to filter small changes in the rate - if it is truly normalized between 0-1, min delta can be some small number such as 0.01 to avoid getting alerts on small rate changes.
- Influencing metrics are a MUST - they should be the volume of the denominator of the rate being computed: e.g,. number of add to carts, number of impressions, Number of games started, etc.
- In the typical case of drops in conversions, the timing of the influencing metrics should be DURING the anomaly; e.g. Alert on drop in checkout conversion rate if the volume of add to cart was at least XX DURING the anomaly.
- If direction UP is interesting, the recommended settings is to add an influencing metric condition on the same volume metrics, but BEFORE the anomaly.
- An alert with conversion rates and both directions would have the denominator metrics as influencing metrics DURING and also BEFORE the anomaly.
ALERTS ON AVERAGES
For example on Average Response Time, Average Time on Page, Average Cart Value, average pages per session, etc.
- Direction depends on the exact use case (typically an alert on average response time is UP, and on average time on page or cart value is DOWN).
- Influencing metrics on the count that was used to compute the average is highly recommended - DURING the anomaly. For example, if the we’re measuring average response time to load a page, alert only if the number of page loads used to compute the average was higher than some minimum.
- A possible alternative to the influencing metrics is to exclude altogether data points that did not have enough support for computing the average by creating a composite metric that excludes those data points. For example, if the metric is average page load time, and usually there are 100 page views, but in the last 5 minutes there were just 5, the average load time may be anomalously high, but nobody really cares because only 5 users at most were affected. While the influencing metrics will eliminate the alert, the baseline may change - so it might be better for the system to learn a baseline without those data points. An example of such a composite is shown below. The volume metrics are filtered if they are below 15 in the given time period, making the average NULL in those periods.
Note use of this is good per time scale - the value is chosen based on the time scale of the alert.
ALERTS ON FAILURE RATES
For example on error rates, app crash rate, connection drop rate, bounce rates (on pages), etc.
- Direction UP is interesting.
- Influencing metrics on the denominator volume is a must. Similar to conversions, it is DURING the anomaly.
- Add a small min delta to filter very small changes in the rate.
ALERTS ON DISTRIBUTION CHANGES
For example distribution changes of multiple outcomes: e.g. changes in distribution of types of errors (% of error 1, % of error 2, etc), changes in distribution of types of users (% of new, % of paying, % of freemium), changes in distribution of products purchased (% of shoes, % of socks, % of shorts etc.)
- Use the asPercent function
- Direction is UP and DOWN
- The distribution change helps identify changes of behavior when there are multiple outcomes - whether at the application level (errors) or at the product level (type of users, type of purchases). The changes are not always actionable, but can trigger further investigation: e.g. Why are shoes now more popular? Why did the % of freemium users increase?
- Influencing metrics should be the total of all outcomes (e.g. total number of errors, total number of users, total number of products purchased). It should be applied DURING the anomaly.
- Recommended absolute Min Delta at 1 (only a change of at least 1% will trigger the alert).