Kafka Connect is a technology for linking Kafka with other services. One of the main challenges of an IT architecture is the batch process and a bunch of codes that transport or align data from one to another application. A small failure can cause ultimate data and business losses with developer involvement at the same time. Many new architectures nowadays focus on real-time solutions keeping in mind the ‘easy to maintain’ and fault tolerances.
Kafka – a key open-source product for real-time architecture
Kafka is the key open-source product that addresses all points. The product is not only capable of acting as a message queue but also enlightens various other aspects, such as batch replacement, log monitoring, and health check, web activity tracking, creating analytics from real-time data, processing huge IoT data, etc.
What is Kafka Connect
Kafka Connect is a member of the Kafka family which is responsible for integration into various data sources and sinks. In this article, we will try to enlighten the design and development approach to build such implementation.
There are many enriched Kafka platforms already in the market, such as Confluent and Debezium, etc. Confluent is having all in one into a robust in-built platform that can be easily integrated into any cloud-based architecture. Similarly, we can also create or enrich any open-source ‘Kafka Connectors’ to achieve our goals.
How to create and run customized Kafka Connect
Here I will try to establish the step-by-step concept to create and run a customized Kafka Connect by enriching an opensource Debezium distributed platform. One of the reasons for picking Debezium which is a distributed platform that turns our existing databases into event streams, so applications can quickly react to each row-level change in the databases, is the compatibility with the private cloud Red Hat ‘Open shift’.
Firstly, let us have a small technical inside into Kafka Connect. A typical Kafka Connect is having REST APIs(create/start/stop the connector), tasks (The implementation of how data is copied from the external system to Kafka), filter/transformation and then converter (The code used to restructure data between the connector and the external system). This whole package will be connecting some external system and Kafka.
Debezium open-source distributed platform
The main components of a Debezium Connect containers are bin, config, connect, data, libs, and logs. Here the libs folder is having all the required jars for transformations and conversion and the config folder is having the required configuration properties. So, keeping all the existing facilities as it is, only connects is the folder under which we need to put our new feature (for the new data sources or sinks). Connectors can be two types, i.e. source and sink. We will now discuss the technical enrichment of the connector.
Now from the old school java or dot net practice, if you are using java for creating your application and the batch process, you are using JDBC in your application for connecting different data sources or databases with the generic configuration for the same. Things are very similar in this case as well.
- First, we need to create a custom new project and need to implement the logic. We need to write code for configuration property, custom source connector, and custom sink connector.
- Also, we need to pass the configuration properties to the task (defined under the connector class). The task should include abstract methods to start, stop, and version.
- Mostly, streaming data into Kafka should be happened using the poll method, which is continually called by the Kafka Connect for each task.
- Lastly, the Kafka connects REST APIs include an endpoint (PUT) to modify the configuration. To make it dynamic, we need to create a separate thread for monitoring changes and create a new instance of the monitoring thread upon connector start-up.
Once we are done with the project, we should create a jar from this project and keep the jar inside the connect folder of the connector. Using docker this can be done by writing a quick docker file. The docker file and the jars need to be in the same folder and from that folder we can execute below command to create a new enriched custom connect image.
The newly deployed Kafka Connect Debezium container will have the new features and capabilities to connect our required sources and sinks.
Any kinds of technically reachable data sources and sinks can be integrated using Kafka Connectors. Being said that, the new implementation will have a minimal codebase, CDC features, real-time data reflection, sequencing, fault tolerances, etc. Also implementing and managing data pipelines (DataOps) is much easier.
Businesses from the financial sector to smart retails, automobile to IoT devices all are aiming at ‘Real-time data implementations’. Whether it comes to fraud transaction detection in a bank or anomaly detection from IoT sensor data, there is no alternative to this one-step solution.
To conclude, there is a famous quote from Prometheus, “Big things have small beginnings”. Any business which requires digital and real-time data attention must implement the Kafka Connect features to have the ‘real-time profit’.
Written by Swarnava Chakraborty. Swarnava is a Technical Lead (consultancy and delivery) at Technaura Systems GmbH.