In the incoming article, we would be shedding some light over real time vs batch processing systems. This is an important concept to understand as they define the SLAs for a modern system. In layman’s terms, these define ”Acceptance Criteria” for a system. Most of the modern systems are built with few expectations or some defined parameters where it is benchmarked.
Real time processing vs batch processing
We will get to the core of this by understanding each of them independently.
Real time systems or streaming
Let’s visualize this with an example, take Netflix (hits the mind whenever we hear “Streaming”) for instance, it is a streaming platform where we watch movies and tv shows readily available for immediate viewing. We call it streaming because the content is almost available for immediate viewing. You click on a show and it starts playing (Pretty amazing, right !!). When the show plays, the Netflix system streams the actual video file to your Android TV or phone in real time, without any lag (depends on bandwidth).
Now let’s take another example, then we can summarize streaming. Take your LinkedIn, Instagram or Facebook feeds, these objects whether a story, a post, an update is simply an event, created by individual users, we will call these as events. Like a generic social-networking platform, these events are published immediately over the followers, all supposed users get a serial (timescape) manner in real time. This is a use case of event stream processing.
Summarizing the above examples of real time processing, real time is anything which happens immediately almost instantaneously (without any lag). Data Streaming platform is any platform which implements the same principle, for instance Apache Kafka. Kafka is optimized for low latency processing and delivery.
Creating a daily “to do” list and executing them serially and striking them off one by one (As they are carried out) is an example of batch processing. Here you basically try to pool in tasks which are similar and would require more time if addressed individually and exclusively.
For example, a mail delivery system will have similarly addressed mails batched w.r.t pincodes. This increases throughput and overall efficiency. Also the most important aspect is the priority and relevance of the deliverables.
Entities in the case of a batch processing system do not lose their significance if they are delayed, but in stream processing fast delivery is the key.
A credit card processing system can not sustain lag in delivery, the transaction sessions would expire, increasing delay expectancy would introduce security risk (More window for an offender to intercept).
Bank wire transfers are a type of batch processing, where we initiate and authenticate a transaction, the transaction lands in a queue and finally settled in hourly batches.
Real time vs batch processing in Big data
After the advent of big data, streaming was also introduced for capturing fast changing data sets, CDC in databases (Almost everywhere) is captured as events, analytics on unstructured data sets from social platforms, IOT sensor data etc.
This information needs to be committed instantaneously in the data lake and processed by visualization systems to show graphical dashboards. This requires commissioning Streaming Data Integration, tools like Apache Kafka, Apache NiFi etc assist in this feat. Many cloud native platforms are used for real time processing in Big Data , AWS Kinesis, etc.
Even though Big Data systems intensively deal with batch processing and a majority of the enterprise sources have Change Data rather than full snapshot. These sources require a processing platform optimised for streaming.
Find more details on Real time processing vs batch processing here.