Declarative Connectors with Confluent for Kubernetes (2023)

Apache Kafka® has a rich ecosystem of hundreds of connectors to get data in and out of Kafka. You probably need multiple connectors in your architecture to support data in motion across your infrastructures. So how can you manage this effort in a consistent, infra-as-code, automated manner?

This blog post leverages Confluent for Kubernetes (CFK), a cloud-native control plane that enables a complete declarative experience to deploy and self-manage Confluent Platform in a private Kubernetes-based infrastructure. It uses an infrastructure-as-code approach to manage Confluent Platform components as well as application resources like topics, schemas, and, with the CFK 2.1 updated, connectors.

We’ll demonstrate how to declaratively manage connectors with CFK and digs into some of the Kubernetes API design patterns used.

What are connectors?

Connectors are used by Kafka Connect to stream data between Kafka and other external systems. There are two aspects to a connector:

  • Connector plugin: A connector plugin is the binary or jar that implements all the classes or abstractions of the connector, which is installed in Kafka Connect worker. Kafka Connect is a powerful and pluggable data integration framework. You can develop your own custom connector plugins or get the connector plugins from Confluent Hub.
  • Connector instance: A connector instance is a logical job that is responsible for managing the copying of data between Kafka and other data systems. After the connector plugin is installed, you need to create or manage the corresponding connector instance via Kafka Connect Rest API to perform the actual job of streaming data.

To differentiate between the two, this blog uses “connector plugin” for the former and “connector” for the latter.

In order to use connectors, the connector plugin needs to be made available to and deployed on Kafka Connect workers. In the past, this has been left to the end user to do:

(Video) Demo: Confluent for Kubernetes

  • First, you needed to add the plugin to the Confluent Kafka Connect image—often by extending the Connect Docker image with the plugin jars.
  • Then you needed to send connector configurations and start the connector via a script that would invoke the Connect REST API.

This is not the declarative way, and we’re excited to introduce our Kubernetes-native way to manage the lifecycle of connectors and build an automatic and reliable solution to install connector plugins in Kafka Connect.

Our aim is to help you:

  • Achieve a one-step, fully automated deployment and management experience for connectors
  • Reduce manual and error-prone operations
  • Unblock integration with the GitOps model

This blog focuses on answering these two questions:

  • How does CFK manage connectors declaratively in Kubernetes?
  • How does CFK automate the installation of connector plugins?

Manage connectors declaratively in Kubernetes

CFK provides a declarative API to manage connectors by extending Kubernetes API as a Connector CustomResourceDefinition (CRD), which allows you to define the high-level desired state of a connector as a Custom Resource (CR). You can provision, configure, update, delete, conduct management of tasks, or observe the current state of the connector easily through interactions with the connector CR YAML file. Rather than make REST API calls manually, CFK helps take care of all the low-level details management.

Define desired state for a connector

The following is an example CR of S3 source connector.

kind: Connectormetadata: name: connectorspec: name: connector class: "io.confluent.connect.s3.source.S3SourceConnector" taskMax: 3 configs: topics.dir: "quickstart" value.converter: "org.apache.kafka.connect.json.JsonConverter" mode: "GENERIC" format.class: "io.confluent.connect.s3.format.json.JsonFormat" topic.regex.list: "quick-start-topic:.*" s3.bucket.name: "demo-bucket" s3.region: "us-west-2" value.converter.schemas.enable: "false" aws.access.key.id: "${file:/mnt/secrets/aws-credential/aws.txt:aws_access_key_id}" aws.secret.access.key: "${file:/mnt/secrets/aws-credential/aws.txt:aws_secret_access_key} restartPolicy: type: OnFailure

The above connector CR YAML file defines the desired class name and configurations of the connector. If using Kubernetes native command tool kubectl, you can easily create or update the connector with kubectl apply -f connector.yaml, or delete it with kubectl delete -f connector.yaml. The following illustrates the workflow:

Declarative Connectors with Confluent for Kubernetes (1)

(Video) Kafka Connect Build: Declarative management of connector plugins in Kafka Connect

After the specified connector CR is submitted to Kubernetes API server, for example using kubectl, the connector controller of CFK keeps watch and makes REST API calls internally to Kafka Connect to ensure the connector achieves the desired state. With any updates on the CR, CFK gets notified and automates the operations.

View granular status

Connectors run on Kafka Connect workers. To have the latest status reflected on connector CR, CFK reconciles every few minutes, checks with Kafka Connect, and updates the CR status with the latest observed status of the connector and its tasks. To view the status, you can use kubectl get connector -n <your namespace>, or use kubectl get connector -n <your namespace> -o yaml to view all the details.

The above output shows the status of the connector and its tasks. If there were any failures, the status shows the detailed information for debugging or troubleshooting, for example, failed tasks, error messages, workerId, the connector or task running, etc.

Automate recovery

With the declarative connector Kubernetes API, you can manage the lifecycle of connectors efficiently and effortlessly. If the connector is in FAILED state for any reason CFK has an auto-healing strategy.

CFK restarts the connector automatically if the connector fails. You can also configure the policy to restart failed tasks. If the restart policy is OnFailure, CFK will restart the failed tasks automatically until it reaches the max retry times specified in the connector CR. This helps service healing, especially for some scenarios when the connector or task requires a manual restart to recover. Even though it can be done by making a REST API call, it won’t be a good experience if you cannot access Kafka Connect externally or the REST API requires authentication or TLS certificates to connect. It impacts the efficiency to recover the connector.

Furthermore, during troubleshooting, it may be necessary to trigger one restart, pause, or resume operation on the connector or task. You can add an annotation in the connector CR to achieve that, for example, kubectl annotate connector <your connector name> platform.confluent.io/restart-connector="true". This helps to improve troubleshooting efficiency.

Complete security

CFK allows you to provide the connector configurations as key-value pairs in connector CR. However, there are two concerns:

(Video) How to install Kafka Connect connector plugins

  • How to prevent sensitive data (e.g., username, password) from appearing in cleartext in the YAML file?
  • How to provide the required TLS certificates to communicate with the other data system?

ConfigProvider class interface, supported by Kafka Connect, allows you to use variables in their connector configurations, for example, the configurations: aws.access.key.id and aws.secret.access.key in the connector CR example. Kafka Connect helps resolve the variables dynamically when the connector starts. However, how does this aws.txt get injected into the Kafka Connect pods?

CFK manages Kafka Connect through Connect CustomResourceDefinition(CRD). It supports the capability of mounting secrets into Kafka Connect pods. A secret is a Kubernetes object containing sensitive data. It can be used as files in a pod. When the secret name is specified in the connect CR spec, the data of the secret is injected into the Kafka Connect pods, and can be used by connectors.

 mountedSecrets: - secretRef: <your-connector-credential-secret-name>

Similarly, some connectors require specific certificates to communicate with the external data systems, which are not present in Kafka Connect workers. As shown in the connect CR spec below, your required certificates can be mounted into pods automatically. You can view the certificate file paths in connect CR status and use them for connectors. This helps reduce the burden to mount those by yourself.

spec: connectorTLSCerts: - secretRef: <your-connector-tls-secret-name>

Automate connector plugins installation with CFK

With declaratively managed connector instances, adding connector plugins to Kafka Connect is still another challenge for customers. It requires disjointed efforts to prebuild an image by extending the Kafka Connect base image with connector plugins. This results in some of the following inconveniences for different customers:

  • It blocks customers who don’t have a private registry
  • If the extended image size becomes large, it takes time to download, and may result in failures during deployment
  • Customers have to build the image again for adding or upgrading plugins
  • It also breaks the continuous pipeline deployment experience, like GitOps

That’s why a reliable and fully CFK-operated solution to install connector plugins into Kafka Connect is in high demand.

How does it work?

CFK enables the capability to use init container to download and install connector plugins. Init container is a specialized container that runs before app containers in a pod. It can contain setup scripts which are not present in an app image and always runs to completion. This means init container can complete the plugins installation before starting the Kafka Connect server.

Declarative Connectors with Confluent for Kubernetes (2)

(Video) Running a Self-Managed Connector with Confluent Cloud (Hands On) | Kafka Connect 101

CFK manages Kafka Connect as connect CRD in Kubernetes. As a result, CFK allows users to specify the installation information in the connect CR YAML file. After connect CR is submitted successfully to the Kubernetes API server, CFK watches it and creates resources like StatefulSet, ConfigMaps, etc. When the Connect pod starts, the init container runs first. It reaches out to Confluent Hub or any other remote archive path, downloads the requested connector plugins, stores them in node volume, then installs them successfully using Confluent Hub Client.

Here is an example of a declarative Connect spec to install connector plugins from Confluent Hub.

kind: Connectmetadata: name: connectspec: replicas: 2 image: application: confluentinc/cp-server-connect:7.0.1 init: confluentinc/confluent-init-container:2.2.0-1 build: type: onDemand onDemand: plugins: locationType: confluentHub confluentHub: - name: kafka-connect-s3-source owner: confluentinc version: 2.0.1 storageLimit: 4G

The build type onDemand in the above example means the requested connector plugins will be installed by CFK. The locationType confluentHub means the list of plugins it is getting from Confluent Hub. CFK also provides another option to get plugins from the remote archive path. You can develop your own connector plugin, put it in a custom artifact location, and configure the connect CR with the URL, as shown below. To ensure downloaded artifacts are secure and reliable, a sha512 checksum is mandatory for each plugin.

 build: type: onDemand onDemand: plugins: locationType: url url: - name: kafka-connect-s3-source archivePath: https://example.com/confluentinc-kafka-connect-s3-source-2.0.1.zip checksum: f2e6588d16f0dfff44f306bf4854cdf85fa38cd7f362dfd2c2e50e9e07f5bedaf0c60bd98b2562d09cd868c161ab81511be44732bc2ca83556baf7b61d0d50b0 storageLimit: 4G

From the above examples, there is a configuration called storageLimit. It is used to define the max amount of volume for storing plugins. Kubernetes supports several types of volumes like persistent volume, ephemeral volume, etc. In this solution, we used Node ephemeral volume emptyDir for plugin storage.

Why did we choose emptyDir? emptyDir is a volume type that is first created when a pod is assigned to a node and has the same lifetime as a pod. A container that crashes will not result in the pod leaving the node, so the data in emptyDir is safe across container crashes. Unlike persistent volume, this kind of volume is not persistent. It won’t leak files because Kubernetes destroys the volume when a pod is removed from a node. And it doesn’t depend on volume availability, which is one of the biggest benefits. It helps keep Kafka Connect stateless. Another advantage is emptyDir volume is stored on whatever medium that backs the node. It does not require extra costs, or any extra setup for volume. The declarative Connect provides the flexibility for you to define the storage limit for plugins.

With connect CR, you can effortlessly provision Kafka Connect with connector plugins through a one-step operation, for example, running the command kubectl apply -f connect.yaml. Moreover, it eases the process to add more plugins or upgrade plugin versions. After Kafka Connect is up, installed connector plugins are displayed in the status of Kafka Connect CR.

See it in action

This demo installs the S3 source connector plugin in Kafka Connect and creates an S3 source connector instance to copy the data from S3 to Kafka.

(Video) Run Confluent / Confluent Kafka using docker and install connector

Declarative Connectors with Confluent for Kubernetes (3)

Summary

Declarative connectors with CFK provides a complete Kubernetes-native pattern of managing connectors without any disjointed efforts. With desired state defined in a single spec, CFK automates the rest for you. This unblocks the GitOps model to deploy Kafka Connect and connectors. Check out the resources below to learn more about CFK!

  • Documentation: Confluent for Kubernetes
  • Blog post: Building a Real-Time Data Pipeline with Oracle CDC and MarkLogic Using CFK and Confluent Cloud
  • Blog post: Managing Hybrid Cloud Data with Cloud-Native Kubernetes APIs

FAQs

How do I add a Confluent connector? ›

To manually install a connector:
  1. Find your connector on Confluent Hub and download the connector ZIP file.
  2. Extract the ZIP file contents and copy the contents to the desired location. ...
  3. Add this to the plugin path in your Connect properties file. ...
  4. Start the Connect workers with that configuration.

How do you deploy Confluent Kafka on Kubernetes? ›

For this quick start guide, your Kubernetes cluster is assumed to have a default dynamic storage provisioner.
  1. Step 1: Create a namespace. Create the namespace to use. ...
  2. Step 2: Install Confluent for Kubernetes. Add the Confluent for Kubernetes Helm repository. ...
  3. Step 3: Install Confluent Platform. ...
  4. Step 4: View Control Center.

What is Confluent Kafka connectors? ›

Kafka Connectors¶ Self-managed connectors. Use connectors to copy data between Apache Kafka® and other systems that you want to pull data from or push data to. You can download connectors from Confluent Hub.

Why use Confluent instead of Kafka? ›

While other Kafka solutions only support the Java client, Confluent provides a broad set of battle-tested clients for popular programming languages, including Python, C/C++, Go, and . NET. We also provide ksqlDB, a streaming database to build end-to-end applications on top of Kafka with the ease and familiarity of SQL.

Are confluent connectors free? ›

Is Confluent Kafka free? The Confluent Kafka Platform is free to download and use under the Confluent Community License. Unlike Apache Kafka which is available under the Apache 2.0 license, the Confluent Community License is not open source and has a few restrictions.

What is the difference between Kafka Connect and connector? ›

Kafka Connect manages the Tasks ; the Connector is only responsible for generating the set of Tasks and indicating to the framework when they need to be updated. Source and Sink Connectors / Tasks are distinguished in the API to ensure the simplest possible API for both.

What is the difference between Kafka and confluent? ›

While both platforms fall under big data technologies, they are classified into different categories. Confluent Kafka falls under the data processing category in the big data. On the other hand, Apache Kafka falls under the data operations category as it is a message queuing system.

How do I connect to a confluent cluster Kafka? ›

Ensure you are on an Operating System currently supported by Confluent Platform.
  1. Step 1: Download and Start Confluent Platform. ...
  2. Step 2: Create Kafka Topics. ...
  3. Step 3: Install a Kafka Connector and Generate Sample Data. ...
  4. Step 4: Create and Write to a Stream and Table using ksqlDB. ...
  5. Step 5: Monitor Consumer Lag.

How do I connect Kafka outside Kubernetes? ›

You can expose the Kafka cluster outside the Kubernetes cluster by declaring one or more externalListeners in the KafkaCluster custom resource. Above, externalListeners creates two external access points through which the Kafka cluster's brokers can be reached. These external listeners are registered in the advertized.

How many connectors does confluent have? ›

Confluent offers 120+ pre-built connectors to help you quickly and reliably integrate with Apache Kafka®. We offer Open Source / Community Connectors, Commercial Connectors, and Premium Connectors.

What is alternative for Kafka connector? ›

Known for its speed, ease of use, reliability, and capability of cross-platform replication, Amazon Kinesis is one of the most popular Kafka Alternatives. It is used for many purposes, including geospatial data connected to users, social networking data, and IoT sensors.

When should I use Kafka connector? ›

Kafka Connectors are ready-to-use components, which can help us to import data from external systems into Kafka topics and export data from Kafka topics into external systems. We can use existing connector implementations for common data sources and sinks or implement our own connectors.

Who is Confluent competitor? ›

Confluent competitors include StreamSets, Red Hat, Databricks, Cloudera and MemSQL.

What is the limitation of confluent Kafka? ›

The service cannot be moved between regions. It is also not possible to update the service to use a region if initially installed without one.

Is Confluent an ETL tool? ›

Confluent enables simple, modern streaming data pipelines and integration — the E and L in ETL — through pre-built data connectors.

Is Confluent doing well? ›

As of February 01, 2023, Confluent Inc had a $6.6 billion market capitalization, putting it in the 85th percentile of companies in the Software industry. Confluent Inc does not have a meaningful P/E due to negative earnings over the last 12 trailing months.

Why is Confluent better? ›

By integrating historical and real-time data into a single, central source of truth, Confluent makes it easy to build an entirely new category of modern, event-driven applications, gain a universal data pipeline, and unlock powerful new use cases with full scalability, performance, and reliability.

Is Kafka owned by Confluent? ›

Confluent provides Apache Kafka as a managed service. What does that mean, you ask? If you're a developer, using Confluent means that you get to use Kafka without having to set it up on your own servers. Confluent hosts Kafka for you, and you pay on a usage basis.

Is Kafka Connect a confluent product? ›

Kafka Connect is a framework to stream data into and out of Apache Kafka®. The Confluent Platform ships with several built-in connectors that can be used to stream data to or from commonly used systems such as relational databases or HDFS.

Is Kafka stateful or stateless? ›

Kafka Streams Stateful functions

The state associated with these operations is stored in local "state stores" - either in-memory or on disk. The "data locality" makes the processing much more efficient. You can also configure your application such that this state store data is also sent Kafka topics.

Is Confluent Kafka free or paid? ›

The Confluent Kafka Platform is free to download and use under the Confluent Community License. Unlike Apache Kafka which is available under the Apache 2.0 license, the Confluent Community License is not open source and has restrictions.

Is Confluent owned by Atlassian? ›

Confluence is a web-based corporate wiki developed by Australian software company Atlassian.

Is Confluent a unicorn? ›

Confluent is the newest in a string of companies to reach tens of millions in revenue, big-name multinational customers and a unicorn valuation by providing services, support and management tools around free, open-source technology.

What is Kafka JDBC connector? ›

The JDBC source and sink connectors allow you to exchange data between relational databases and Kafka. The JDBC source connector allows you to import data from any relational database with a JDBC driver into Kafka topics.

Is Debezium a Kafka connector? ›

Debezium is a distributed platform that converts information from your existing databases into event streams, enabling applications to detect, and immediately respond to row-level changes in the databases. Debezium is built on top of Apache Kafka and provides a set of Kafka Connect compatible connectors.

What is MSK connector? ›

MSK Connect is a feature of Amazon MSK that makes it easy for developers to stream data to and from their Apache Kafka clusters. MSK Connect uses Kafka Connect 2.7. 1, an open-source framework for connecting Apache Kafka clusters with external systems such as databases, search indexes, and file systems.

How do I connect to Kafka cluster in Kubernetes? ›

Kafka and Zookeeper cluster setup in kubernetes running on native...
  1. Step 1 - Buy or get a free virtualization software and set up a guest OS. ...
  2. Step 2: Install container run-time and run kafka as a container and zookeeper on the host.
Apr 21, 2022

How do I access applications outside the Kubernetes cluster? ›

Ways to connect

You have several options for connecting to nodes, pods and services from outside the cluster: Access services through public IPs. Use a service with type NodePort or LoadBalancer to make the service reachable outside the cluster. See the services and kubectl expose documentation.

How do frontend and backend communicate in Kubernetes? ›

The frontend sends requests to the backend worker Pods by using the DNS name given to the backend Service. The DNS name is hello , which is the value of the name field in the examples/service/access/backend-service. yaml configuration file. Similar to the backend, the frontend has a Deployment and a Service.

What are the 3 connectors? ›

Electrical connectors are classified into three types based on their termination ends: board-to-board connectors, cable/wire-to-cable/wire connectors, and cable/wire-to-board connectors.

How many connections can Kafka handle? ›

A maximum of 16384 GiB of storage per broker. A cluster that uses IAM access control can have up to 3000 TCP connections per broker at any given time. To increase this limit, you can adjust the listener.

Is Confluent growing? ›

For the full year 2022, Confluent expect revenue to be 567-571 million USD, representing YoY growth of 46-47%. Confluent currently has a relatively small customer base, which may reflect their current value proposition being based around reducing the complexity and cost of self-managed data streaming.

What should you not use with Kafka? ›

It's best to avoid using Kafka as the processing engine for ETL jobs, especially where real-time processing is needed. That said, there are third-party tools you can use that work with Kafka to give you additional robust capabilities – for example, to optimize tables for real-time analytics.

What are the 4 major Kafka APIs? ›

Producer API -It allows an application to publish streams of records. Consumer API -It permits an application to subscribe to topics and process streams of records for consumption. Connector API -This API simply executes consumer APIs with existing applications.

Is there anything better than Kafka? ›

ActiveMQ, RabbitMQ, Amazon Kinesis, Apache Spark, and Akka are the most popular alternatives and competitors to Kafka.

Should you run Kafka on Kubernetes? ›

It is possible to run Kafka on Kubernetes, so just do it. You'll get your environment allocated faster and will be able to use your time to do productive work rather than fight an organizational battle.

Does Kafka use HTTP or TCP? ›

Kafka uses a binary protocol over TCP. The protocol defines all APIs as request response message pairs. All messages are size delimited and are made up of the following primitive types.

Is Kafka synchronous or asynchronous? ›

Kafka is widely used for the asynchronous processing of events/messages. By default, the Kafka client uses a blocking call to push the messages to the Kafka broker. We can use the non-blocking call if application requirements permit.

Is Confluent owned by LinkedIn? ›

Confluent, which commercializes the Apache Kafka open-source software, spun out of LinkedIn in 2014. The start-up was LinkedIn's first corporate investment. Microsoft, which now owns LinkedIn, is benefiting from Confluent's rapid growth.

Is Confluent hosted on AWS? ›

Confluent has continued to grow its partnership with AWS and now offers its Confluent Cloud solution powered by Apache Kafka on AWS Marketplace. “Our customers have unique requirements, and AWS offers native services to meet those needs,” says Joseph Morais, Cloud Partner Solutions Architect at Confluent.

Who bought Confluent? ›

CoFluent Design provides system modeling and simulation solutions that enable embedded system and chip designers to imagine and validate new concepts and architectures. In 2011, CoFluent Design was acquired by Intel.

Can Redis replace Kafka? ›

Redis is used if you want to deliver messages instantly to the consumer and you can live up with data loss, and the amount of data to deal is less. Kafka can be used when you're looking for reliability, high throughput, fault-tolerant, and volume of data is huge.

Is Kafka deprecated? ›

This will be removed in a future major release. This class has been deprecated and will be removed in a future release. Please use Deserializer instead.
...
Contents.
MethodDescription
org.apache.kafka.streams.kstream.KStream.groupByKey​(Serialized<K, V>)since 2.1. Use KStream.groupByKey(Grouped) instead
88 more rows

Does Confluent use ZooKeeper? ›

Apache Kafka® uses ZooKeeper to store persistent cluster metadata and is a critical component of the Confluent Platform deployment.

How do I create a connector in confluent Kafka? ›

  1. 4 Steps to Creating Apache Kafka Connectors with the Kafka Connect API. Technology. ...
  2. Step 1: Define your configuration properties. ...
  3. Step 2: Pass configuration properties to tasks. ...
  4. Step 3: Task polling. ...
  5. Step 4: Create a monitoring thread.
Oct 23, 2019

How do you add a connector? ›

On the Insert tab, in the Illustrations group, click Shapes. Under Lines, right-click the line or connector that you want to add, and then click Lock Drawing Mode. Click where you want to start the line or connector, and then drag the cursor to where you want the line or connector to end.

How do I add a connector to Salesforce? ›

In the Lightning app, select Account Engagement Settings and then Connectors.
  1. Click + Add Connector.
  2. Click Salesforce.
  3. Click Create Connector.
  4. Log in to Salesforce using the connector user's credentials.
  5. After you're logged in, click Allow to give the connector user access to your business unit.

How do I install a custom connector? ›

Start the custom connector wizard

Sign in to Power Apps or Power Automate. On the left pane, select Data > Custom connectors. Select New custom connector, and then select Create from blank. Enter a name for the custom connector, and then select Continue.

Videos

1. Apache Kafka® 101: Kafka Connect
(Confluent)
2. Modernize your data warehouse with Confluent
(Confluent)
3. Deploying Kafka Connect | Kafka Connect 101
(Confluent)
4. Confluent Platform 7.1| New Features + Updates
(Confluent)
5. Sink Kafka Topic to Database Table | Build JDBC Sink Connector | Confluent Connector | Kafka Connect
(The Java Tech Learning)
6. Getting Started with Kafka Connect (Hands On) | Kafka Connect 101
(Confluent)
Top Articles
Latest Posts
Article information

Author: Gregorio Kreiger

Last Updated: 04/10/2023

Views: 6528

Rating: 4.7 / 5 (77 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Gregorio Kreiger

Birthday: 1994-12-18

Address: 89212 Tracey Ramp, Sunside, MT 08453-0951

Phone: +9014805370218

Job: Customer Designer

Hobby: Mountain biking, Orienteering, Hiking, Sewing, Backpacking, Mushroom hunting, Backpacking

Introduction: My name is Gregorio Kreiger, I am a tender, brainy, enthusiastic, combative, agreeable, gentle, gentle person who loves writing and wants to share my knowledge and understanding with you.