Home>Data Ingestion Tools>Apache Kafka>Enterprise Data Streaming Architecture
Rwal time data analytics using AWS Kinesis
Apache Kafka Apache Nifi Articles Data Engineering Data Ingestion Tools Data Modeling Data Pipelines Data Structuring Real time data

Enterprise Data Streaming Architecture

A modern enterprise Data Streaming Architecture will have a lot of communications, these can include application queues, streaming data, control data, data transfers, for backup, syncing, updates, IOT sensor data, system snapshots and other never ending use cases. We would be discussing about some commonly used use-cases in most enterprises.

Data Streaming Architecture Diagram

 

These look similar to a utility pipeline, so sometimes referred to as streaming data pipeline architectures.We would be discussing some commonly used data streaming use-cases in most enterprises.

Enterprise data streaming architecture

A common Enterprise data streaming architecture

Incoming of data from source in Data Streaming Architecture

Let’s begin with the most popular data storing method i.e. Databases. They are the most successful IT applications ever invented and are still widely used. Even now most critical data storing needs are tasked onto a database due their robust algorithm.

But is there a way where we can safely scrap the operations happening on a database without actually interfering in the regular database operation ? The answer is CDC or Change Data Capture, most systems use CDC to copy, migrate, analyze changes happening in a system. CDC is a stream of  continuous operations which eventually lose significance or validity after a certain time (As a record with a particular primary key can hit multiple updates in a few seconds).

Apache Kafka comes to rescue to appropriately handle the CDC data incoming from a system. It processes and maintains a low latency system, allowing multiple beneficiaries to consume the same set of data.

Files based snapshots are used to get a copy of data from modern storage systems, these can be created to either archive or do historic analysis. These can be streamed at a particular interval into Kafka, ultimately pushed to the target.

These special Kafka topics can be used as a look-up table for Kafka streams Ktable in Kafka streams worloads for on the fly transformation within Kafka.

Application messaging is another use-case where your existing Kafka service can be re-used to decouple the applications(To bridge applications with varying processing speeds). This is very similar to regular message queue implementations.

 

External systems (Structured and unstructured)
IOT or Internet of Things, the technology that enables conventional equipment smart and responsive use sensors which can generate periodic events/alerts every milliseconds. These events are very huge in numbers, they are individually mostly insignificant, but cumulatively create a holistic tracking infrastructure. To process these we need a nearly low-latency system i.e. Streaming Platform. Kafka plays this role like a pro. Further, the data from Kafka can be consumed by sensor data processing systems to plot an interactive dashboard or similar visualization mechanism.

Apache Nifi can be used to scrap external data such as from social networks to Kafka, these datasets can be stored in an archival system to be analyzed further and derive useful customer insights.

 

Data consumption from kafka

There are mainly two types of consumers from Kafka, i.e. Real time ingestion or batch processing systems (also called archival systems). It depends on the chronological validity of the data, about how we classify them.

Transactional systems, live feeds, sensor data, application queues are some use-cases classified as real time processing, as these data lose their significance once their time validity expires.

On the other hand, batch processing systems accumulate data over time, the greater the volume of the data (or greater the range) the more precise are the analytic insights derived from them. Batch processing loads are run on these datasets, therefore the urgency of data delivery is not present.

Leave a Reply

Your email address will not be published. Required fields are marked *