Home>Data Ingestion Tools>Apache Kafka>Install Apache Kafka on AWS | Self managed
Apache Kafka Articles AWS Kafka Architecture

Install Apache Kafka on AWS | Self managed

As most folks are adopting cloud setup. One can install Apache Kafka on AWS (Amazon Linux) and manage it themselves. This kind of setup is highly customisable and can be tailored to meet exclusive needs. This again has its associated risks and benefit.

 

Installation of Apache Kafka on AWS can be done many ways, the way mentioned below is best suited for critical loads and follows recommendations from the Kafka owners.

 

Here we are implementing a Kafka cluster with minimalistic design favourable for conducting POCs with certain workloads. All cloud concepts of scalability, High-availability, fault-tolerance can be included within this setup using appropriate cloud design patterns.

 

Please follow the below steps to install self-managed Apache Kafka on AWS (Confluent Platform)  EC2 linux OS.

  1. Create a VPC with public, private subnets and a NAT Gateway for internet access
  2. From EC2 console, deploy three instances running Ubuntu Server 18.04 LTS. Ensure each instance is in an exclusive private subnet, in a different Availability Zone. Assign the default security group.
  3. Enforce below policies on the VPC console.
    • Custom TCP: ports 2888–3888 from all sources.
    • SSH (port 22) restricted to your own IP address.
    • Custom TCP: port 2181 from all sources.
    • Custom TCP: port 9092 from all sources.

All traffic from the same security group identifier.

4. Install Java on each EC2 instance.

sudo apt install default-jdk

 

5. Once Java is installed, ssh login to each instance and download the confluent package.

curl -O http://packages.confluent.io/archive/6.1/confluent-6.1.1.tar.gz

6. Untar the tarball package in each instance

tar xzf confluent-6.1.1.tar.gz

 

7. After untar, the contents of the root confluent Kafka directory would like below.

/bin/ Driver scripts for starting and stopping services

/etc/ Configuration files

/lib/ Systemd services

/logs/ Log files

/share/ Jars and licenses

/src/ Source files that require a platform-dependent build

 

8. Edit ./etc/zookeeper.properties as below, where zoo1,zoo2 & zoo3 are corresponding IP addresses of the instances.

tickTime=2000

dataDir=<dir>/zookeeper/

clientPort=2181

initLimit=5

syncLimit=2

server.1=zoo1:2888:3888

server.2=zoo2:2888:3888

server.3=zoo3:2888:3888

autopurge.snapRetainCount=3

autopurge.purgeInterval=24

 

9. Create a myid file in the <dir>/zookeeper/ directory containing single ascii integer 1,2 & 3 respectively preferably using touch command.

 

10. Edit ./etc/server.properties in each instance as below, put corresponding ID in each instance.

zookeeper.connect=zoo1:2181,zoo2:2181,zoo3:2181

# The ID of the broker. This must be set to a unique integer for each broker.

#broker.id=0

11. Once the above configurations are done, start the zookeeper ensemble by starting all three zookeepers one by one in each instance.

<path-to-confluent-package>/bin/zookeeper-server-start <path-to-confluent-package>/etc/kafka/zookeeper.properties &

 

12. Check for status, should show empty broker ephemeral list [].

bash <path-to-confluent-package>/bin/zookeeper-shell zoo1:2181 <<< ‘ls /brokers/ids’ | tail -n2 | head -n1

[]

13. Once the Zookeeper ensemble is up and running, start the brokers one by one in each instance.

<path-to-confluent-package>/bin/kafka-server-start <path-to-confluent-package>/etc/kafka/server.properties &

 

14. Check for broker status, all nodes should show up in the broker ephemeral list.

bash <path-to-confluent-package>/bin/zookeeper-shell zoo1:2181 <<< ‘ls /brokers/ids’ | tail -n2 | head -n1

[1,2,3]

 

15. Check the kafka cluster functioning by producing some messages via kafka console producer.

bash <path-to-confluent-package>/kafka-console-producer –topic sample-topic –bootstrap-server broker1:9092

After > enter characters(messages) <enter>

Each <enter> will define a message

 

16. Now consume the produced messages with the kafka console consumer.

bash <path-to-confluent-package>/kafka-console-consumer –topic sample-topic –bootstrap-server broker1:9092 \

 –from-beginning

All messages produced earlier would show here.

To exit press ctrl+c to terminate.

 

The above setup is a starting point to create a cluster and scaffolding implementations. 

We can integrate Kafka Connect framework, custom consumers and other legacy systems with the cluster to establish compatibility and feasibility to use. This would introduce a need for customisations in the cluster configuration as well, depending on the use-case we are trying to execute.

One can secure the cluster to comply with industry standards. More security considerations and design considerations can be found here

 

References

  1. Quick Start for Apache Kafka using Confluent Platform (Local)
  2. Manual Install using ZIP and TAR Archives
  3. Install Apache Kafka on AWS

Leave a Reply

Your email address will not be published. Required fields are marked *