This article explains how Stream Delay and Backfill Policy work in Anodot collectors, with a focus on database (RDB) collectors (e.g., BigQuery) and file-based collectors (e.g., AWS S3).
These mechanisms are critical for balancing data completeness and data freshness, especially when collecting data that may arrive late or be written incrementally. When configured correctly, Stream Delay and Backfill Policy ensure that Anodot collects the most accurate and reliable data possible, without introducing unnecessary alert latency.
The Core Challenge: Freshness vs. Completeness
Anodot is designed to analyze data as close to real time as possible. However, many data sources do not guarantee that data for a given time window is immediately complete.
This creates an inherent tradeoff:
- Fetch too early → you may collect partial or empty data.
- Wait too long → alerts and insights are delayed.
Stream Delay and Backfill Policy are the two controls that let you manage this tradeoff.
Stream Delay: When Do We Start Looking for Data?
Stream Delay defines how long the collector waits after an interval ends before attempting to collect data for that interval.
For example, with:
- Stream frequency: Hourly
- Stream delay: 15 minutes
The interval 10:00–10:59 will only be queried at 11:15.
Why Stream Delay Matters
Many systems (data warehouses, ETL pipelines, object storage) write data late or in batches.
If the delay is too short:
- Queries may return partial data, compromising learning and prediction
If the delay is too long:
- Data completeness improves, yet alerts and detection are unnecessarily delayed
Best practice: set the delay to the minimum amount of time after which the data is reliably complete.
Backfill Policy: How Long Do We Keep Trying?
Backfill Policy defines how long will the collector continue to look for past intervals when no data is found.
In practice, the backfill policy will kick in when a query returns 0 rows (for database collectors) or when no matching files/records are found (for file collectors).
It is important to note what backfill policy does not do:
- It does not detect partial or incomplete data
- It does not retry intervals where some data was successfully collected
As soon as any data is returned for an interval, that interval is considered collected and the collector moves forward.
This behavior is critical because Anodot enforces strict chronological ordering:
- Once newer samples are successfully ingested, older samples for the same metric will not be accepted
If the collector moves forward past an interval with 0 data too quickly, a permanent data gap is created. The backfill policy defines how long the collector is allowed to "wait" and keep retrying an empty interval before giving up and moving forward.
Database Collectors (e.g., BigQuery)
Retry Behavior
For database-based collectors, retries occur only when a query returns 0 rows.
The retry frequency depends on the stream interval:
- Daily streams: retry every 0–10 minutes
- Hourly streams: retry every 0–5 minutes
- Other intervals: retry every 1 minute up to 20% of the interval duration
- Example: 2-hour interval → retries between 1 and 24 minutes
Retries continue until:
- Data is found, or
- The backfill window expires
Example: Daily Stream with Backfill Policy = 3
- June 25: Query returns 0 rows
- The collector keeps retrying June 25
- It will continue retrying for 4 days total:
- Current day (June 25)
- Plus 3 additional days (backfill = 3)
If data appears before June 30, it is collected.
If not, the collector gives up on June 25 and moves on, creating a data gap for that day.
File Collectors (e.g., AWS S3)
File collectors behave differently because data arrives as files rather than query results.
Instead of retrying a fixed time window, file collectors use a sliding time window that advances according to the lagging policy.
How Lagging (Backfill) Works for Files
The collector continues retrying while gradually moving the time window forward. Once the lagging window is exceeded, older intervals are skipped permanently.
Example: Hourly Stream with Lagging Policy = 1
- 05:00 – Successful run
- from: 04:00
- to: 05:00
- 06:00 – Query returns 0 records
- from: 05:00
- to: 06:00
- 06:10–07:00 – Multiple retries
- Still no data
- from/to unchanged
- 07:00 – Next interval
- from: 05:00
- to: 07:00
- 07:00–08:00 – More retries
- Still no data
- 08:00 – Lagging window exceeded
- from moves to: 06:00
- to moves to: 08:00
At this point, the interval 05:00–06:00 is permanently skipped and will never be collected.
Key Takeaway for File Collectors
- Retries continue indefinitely unless a fatal error occurs (e.g., missing permissions)
- But once the lagging window is exceeded, older intervals are dropped
- This prevents the collector from being stuck forever on missing files