Replicate data from MySQL, Postgres and MongoDB to ClickHouse
Go to file
Kanthi Subramanian ab42714f88 Added a map of topics to DBWriter instances to cover scenario where messages of multiple topics will be sent to the same Kafka SinkTask.
Fixed bug with extra addBatch
Added logic to check for indexes before inserting additional metadata
Added logic to not remove records if the addBatch was unsuccessful.
2022-07-19 17:41:52 -04:00
.github/workflows Fixed image name in github action for building docker images. 2022-05-11 21:09:38 -04:00
deploy Fix unix permissions 2022-07-19 14:54:31 -05:00
doc Added a map of topics to DBWriter instances to cover scenario where messages of multiple topics will be sent to the same Kafka SinkTask. 2022-07-19 17:41:52 -04:00
docker Fix unix permissions 2022-07-19 14:54:31 -05:00
src Added a map of topics to DBWriter instances to cover scenario where messages of multiple topics will be sent to the same Kafka SinkTask. 2022-07-19 17:41:52 -04:00
tests Added logic to flush records based on buffer size and number of records. 2022-07-15 16:23:19 -05:00
.gitignore Added unit test directory to pom.xml 2022-07-03 19:22:14 -04:00
LICENSE Initial commit 2022-03-21 11:32:45 +03:00
pom.xml Move flush timeout as a configuration variable. Change docker-compose to use environment variable for sink docker version. 2022-07-19 14:07:02 -04:00
README.md Updated list of features. 2022-06-29 15:48:01 -04:00
strimzi.yml Updated documentation with Data types mapping. 2022-04-29 13:32:53 -04:00

Altinity Sink Connector for ClickHouse

Sink connector sinks data from Kafka into Clickhouse. The connector is tested with the following converters

Features

  • Inserts, Updates and Deletes using ReplacingMergeTree/CollapsingMergeTree - Updates/Deletes
  • Deduplication logic to dedupe records from Kafka topic.(Based on Primary Key)
  • Exactly once semantics
  • Bulk insert to Clickhouse.
  • Store Kafka metadata Kafka Metadata
  • Kafka topic to ClickHouse table mapping, use case where MySQL table can be mapped to a different CH table name.
  • Store raw data in JSON(For Auditing purposes)
  • Monitoring(Using Grafana/Prometheus) Dashboard to monitor lag.
  • Kafka Offset management in ClickHouse
  • Increased Parallelism(Customize thread pool for JDBC connections)

Source Databases

  • MySQL (Debezium)
  • PostgreSQL (Debezium) (Testing in progress)

Documentation