The Apache Kafka agent is used to stream records into Anodot and is one of our most commonly-used agents. Note that the following Kafka versions are supported: 0.10, 0.11, 1.0+, 2.0+
This article provides an overview of some of the key elements required when setting up your Kafka agent.
Kafka Agent FAQ
What are the core Kafka capabilities that make this agent effective?
This agent is so effective because data can be streamed in real-time. Kafka can integrate with many other systems, so instead of creating a collector for a variety of systems, we can use Kafka as a single source of data, and other systems will simply publish data to Kafka.
What guidelines should we follow to ensure an efficient workflow for Kafka topics?
To enable ordered processing of the Kafka records, you need to make sure that:
- The number of partitions is larger than or equal to the number of threads. Thus each thread handles a single partition or more, resulting in ordered handling of the records.
- The producers of a given combination of measurement and dimensions store such records to the same partition.
- You do not use the transformations feature because changing metrics after fetching them from Kafka may affect the ordering.
When should we NOT use the Kafka agent?
- When topics can not be configured as described in the question/answer above. This will likely cause data to fall out of order.
- If real-time processing is not required, you can switch to other data sources which support aggregation queries. This can lead to easier processing of data (and lower system resource usage, etc).
Links to sections in the Anodot wiki where relevant:
- Config file settings: see https://github.com/anodot/daria/wiki/Kafka
- Authentication: see https://github.com/anodot/daria/wiki/Kafka:-authentication
- Load distribution: see https://github.com/anodot/daria/wiki/Kafka:-distributing-the-load
- Filtering: see https://github.com/anodot/daria/wiki/Filtering
- Use cases: see https://github.com/anodot/daria/wiki/Kafka:-use-cases