2023-04-04 21:23:37 +00:00
[![Tests ](https://github.com/Altinity/clickhouse-sink-connector/actions/workflows/sink-connector-lightweight-integration-tests.yml/badge.svg )](https://github.com/Altinity/clickhouse-sink-connector/actions/workflows/sink-connector-lightweight-integration-tests.yml)
2023-04-04 19:51:54 +00:00
< p align = "center" >
< a href = "https://join.slack.com/t/altinitydbworkspace/shared_invite/zt-w6mpotc1-fTz9oYp0VM719DNye9UvrQ" >
< img src = "https://img.shields.io/static/v1?logo=slack&logoColor=959DA5&label=Slack&labelColor=333a41&message=join%20conversation&color=3AC358" alt = "AltinityDB Slack" / >
< / a >
2023-04-04 21:23:37 +00:00
< / p >
# Altinity Replicator for ClickHouse (Lightweight version)
2023-04-04 16:57:22 +00:00
New tool to replicate data from MySQL, PostgreSQL, MariaDB and Mongo without additional dependencies.
Single executable and lightweight.
##### Supports DDL in MySQL.
### Release
Images are published in Gitlab.
`registry.gitlab.com/altinity-public/container-images/clickhouse_debezium_embedded:latest`
2023-04-04 18:40:56 +00:00
[Setup instructions ](sink-connector-lightweight/README.md )
2023-04-04 16:57:22 +00:00
![](doc/img/kafka_replication_tool.jpg)
2022-06-13 18:35:34 +00:00
# Altinity Sink Connector for ClickHouse
2022-04-01 12:00:03 +00:00
2022-10-17 18:22:59 +00:00
Sink connector is used to transfer data from Kafka to Clickhouse using the Kafka connect framework.
2022-04-29 17:32:53 +00:00
The connector is tested with the following converters
- JsonConverter
2022-08-09 22:29:25 +00:00
- AvroConverter (Using [Apicurio Schema Registry ](https://www.apicur.io/registry/ ) or Confluent Schema Registry)
2022-04-29 17:32:53 +00:00
2022-08-02 21:41:38 +00:00
![](doc/img/sink_connector_mysql_architecture.jpg)
2022-06-14 17:30:01 +00:00
# Features
2023-01-09 21:41:07 +00:00
- Inserts, Updates and Deletes using ReplacingMergeTree - [Updates/Deletes ](doc/mutable_data.md )
2022-10-19 16:44:54 +00:00
- Auto create tables in ClickHouse
2022-05-06 16:25:33 +00:00
- Exactly once semantics
2022-06-29 19:48:01 +00:00
- Bulk insert to Clickhouse.
2022-05-10 16:58:54 +00:00
- Store Kafka metadata [Kafka Metadata ](doc/Kafka_metadata.md )
2022-06-29 19:48:01 +00:00
- Kafka topic to ClickHouse table mapping, use case where MySQL table can be mapped to a different CH table name.
- Store raw data in JSON(For Auditing purposes)
- Monitoring(Using Grafana/Prometheus) Dashboard to monitor lag.
- Kafka Offset management in ClickHouse
- Increased Parallelism(Customize thread pool for JDBC connections)
2022-04-01 12:00:03 +00:00
2022-06-14 17:30:01 +00:00
# Source Databases
- MySQL (Debezium)
2022-12-28 23:56:03 +00:00
**Note:GTID Enabled - Highly encouraged for Updates/Deletes**
2022-12-28 23:49:38 +00:00
Refer enabling Gtid in Replica for non-GTID sources - https://www.percona.com/blog/useful-gtid-feature-for-migrating-to-mysql-gtid-replication-assign_gtids_to_anonymous_transactions/
2022-12-09 03:21:15 +00:00
- PostgreSQL (Debezium)
2022-04-29 17:32:53 +00:00
2022-10-17 18:22:59 +00:00
| Component | Version(Tested) |
|---------------|-------------------|
2023-01-09 21:41:07 +00:00
| Redpanda | 22.1.3, 22.3.9 |
2022-10-17 18:22:59 +00:00
| Kafka-connect | 1.9.5.Final |
2022-12-02 15:49:39 +00:00
| Debezium | 2.1.0.Alpha1 |
2022-10-17 18:22:59 +00:00
| MySQL | 8.0 |
2023-01-09 21:41:07 +00:00
| ClickHouse | 22.9, 22.10 |
2022-12-02 15:49:39 +00:00
| PostgreSQL | 15 |
2022-10-17 18:22:59 +00:00
2022-08-13 13:00:11 +00:00
2022-10-17 18:22:59 +00:00
### Quick Start (Docker-compose)
2022-12-13 16:31:13 +00:00
Docker image for Sink connector (Updated December 12, 2022)
2022-11-01 21:11:01 +00:00
`altinity/clickhouse-sink-connector:latest`
2022-08-13 13:00:11 +00:00
https://hub.docker.com/r/altinity/clickhouse-sink-connector
2023-01-10 23:07:47 +00:00
### Recommended Memory limits.
**Production Usage**
In `docker-compose.yml` file, its recommended to set Xmx to atleast 5G `-Xmx5G` when using in Production and
if you encounter a `Out of memory/Heap exception` error.
for both **Debezium** and **Sink**
2022-11-01 21:11:01 +00:00
```
2023-01-10 23:07:47 +00:00
- KAFKA_HEAP_OPTS=-Xms2G -Xmx5G
2022-11-01 21:11:01 +00:00
```
2022-11-01 21:05:58 +00:00
2023-01-10 23:07:47 +00:00
2022-11-01 21:05:58 +00:00
### Kubernetes
2022-11-01 21:11:01 +00:00
Docker Image for Sink connector(with Strimzi)
https://hub.docker.com/repository/docker/subkanthi/clickhouse-kafka-sink-connector-strimzi
Docker Image for Debezium MySQL connector(with Strimzi)
https://hub.docker.com/repository/docker/subkanthi/debezium-mysql-source-connector
2022-11-01 21:05:58 +00:00
Recommended to atleast set 5Gi as memory limits to run on kubernetes using strimzi.
``` resources:
limits:
memory: 6Gi
requests:
memory: 6Gi
```
2022-10-17 18:22:59 +00:00
#### MySQL:
2022-08-13 13:00:11 +00:00
```bash
cd deploy/docker
./start-docker-compose.sh
```
2022-12-07 19:39:11 +00:00
#### PostgreSQL:
```
export SINK_VERSION=latest
cd deploy/docker
docker-compose -f docker-compose.yaml -f docker-compose-postgresql.override.yaml up
```
2022-10-17 18:22:59 +00:00
For Detailed setup instructions - [Setup ](doc/setup.md )
## Development:
Requirements
- Java JDK 11 (https://openjdk.java.net/projects/jdk/11/)
- Maven (mvn) (https://maven.apache.org/download.cgi)
- Docker and Docker-compose
```
mvn install -DskipTests=true
```
2022-10-21 13:26:52 +00:00
## Data Types
2022-12-13 16:21:20 +00:00
#### Note: Using float data types are highly discouraged, because of the behaviour in ClickHouse with handing precision.(Decimal is a better choice)
2022-10-17 18:22:59 +00:00
| MySQL | Kafka< br > Connect | ClickHouse |
|--------------------|------------------------------------------------------|---------------------------------|
| Bigint | INT64\_SCHEMA | Int64 |
| Bigint Unsigned | INT64\_SCHEMA | UInt64 |
| Blob | | String + hex |
| Char | String | String / LowCardinality(String) |
| Date | Schema: INT64< br > Name:< br > debezium.Date | Date(6) |
| DateTime(6) | Schema: INT64< br > Name: debezium.Timestamp | DateTime64(6) |
| Decimal(30,12) | Schema: Bytes< br > Name:< br > kafka.connect.data.Decimal | Decimal(30,12) |
| Double | | Float64 |
| Int | INT32 | Int32 |
| Int Unsigned | INT64 | UInt32 |
| Longblob | | String + hex |
| Mediumblob | | String + hex |
| Mediumint | INT32 | Int32 |
| Mediumint Unsigned | INT32 | UInt32 |
| Smallint | INT16 | Int16 |
| Smallint Unsigned | INT32 | UInt16 |
| Text | String | String |
| Time | | String |
| Time(6) | | String |
| Timestamp | | DateTime64 |
| Tinyint | INT16 | Int8 |
| Tinyint Unsigned | INT16 | UInt8 |
| varbinary(\*) | | String + hex |
| varchar(\*) | | String |
| JSON | | String |
| BYTES | BYTES, io.debezium.bits | String |
| YEAR | INT32 | INT32 |
| GEOMETRY | Binary of WKB | String |
2022-11-24 17:56:18 +00:00
### Sink Connector Configuration
| Property | Default | Description |
|----------------------------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| tasks.max | No | SinkConnector task(essentially threads), ideally this needs to be the same as the Kafka partitions. |
| topics.regex | No | Regex of matching topics. Example: "SERVER5432.test.(.*)" matches SERVER5432.test.employees and SERVER5432.test.products |
| topics | No | The list of topics. topics or topics.regex has to be provided. |
| clickhouse.server.url | | ClickHouse Server URL |
| clickhouse.server.user | | ClickHouse Server username |
| clickhouse.server.pass | | ClickHouse Server password |
| clickhouse.server.database | | ClickHouse Database name |
| clickhouse.server.port | 8123 | ClickHouse Server port |
| clickhouse.topic2table.map | No | Map of Kafka topics to table names, < topic_name1 > :< table_name1 > ,< topic_name2 > :< table_name2 > This variable will override the default mapping of topics to table names. |
| store.kafka.metadata | false | If set to true, kafka metadata columns will be added to Clickhouse |
| store.raw.data | false | If set to true, the entire row is converted to JSON and stored in the column defined by the ` store.raw.data.column ` field |
| store.raw.data.column | No | Clickhouse table column to store the raw data in JSON form(String Clickhouse DataType) |
| metrics.enable | true | Enable Prometheus scraping |
| metrics.port | 8084 | Metrics port |
| buffer.flush.time.ms | 30 | Buffer(Batch of records) flush time in milliseconds |
| thread.pool.size | 10 | Number of threads that is used to connect to ClickHouse |
| auto.create.tables | false | Sink connector will create tables in ClickHouse (If it does not exist) |
| snowflake.id | true | Uses SnowFlake ID(Timestamp + GTID) as the version column for ReplacingMergeTree |
| replacingmergetree.delete.column | "sign" | Column used as the sign column for ReplacingMergeTree.
2022-10-17 18:22:59 +00:00
## ClickHouse Loader(Load Data from MySQL to CH for Initial Load)
2022-10-17 21:39:14 +00:00
[Clickhouse Loader ](python/README.md ) is a program that loads data dumped in MySQL into a CH database compatible the sink connector (ReplacingMergeTree with virtual columns _version and _sign)
2022-10-17 18:22:59 +00:00
### Grafana Dashboard
![](doc/img/Grafana_dashboard.png)
2022-11-02 19:01:24 +00:00
![](doc/img/Grafana_dashboard_2.png)
2022-04-29 17:32:53 +00:00
## Documentation
- [Architecture ](doc/architecture.md )
2022-06-14 17:30:01 +00:00
- [Local Setup - Docker Compose ](doc/setup.md )
2022-08-29 12:06:40 +00:00
- [Debezium Setup ](doc/debezium_setup.md )
2022-06-14 17:30:01 +00:00
- [Kubernetes Setup ](doc/k8s_pipeline_setup.md )
2022-04-29 17:32:53 +00:00
- [Sink Configuration ](doc/sink_configuration.md )
2022-06-20 18:24:33 +00:00
- [Testing ](doc/TESTING.md )
2022-08-09 22:29:25 +00:00
- [Performance Benchmarking ](doc/Performance.md )
2022-09-19 15:50:15 +00:00
- [Confluent Schema Registry(REST API) ](doc/schema_registry.md )
2022-10-17 18:22:59 +00:00
2022-10-19 16:51:21 +00:00
## Blog articles
- [ClickHouse as an analytic extension for MySQL ](https://altinity.com/blog/using-clickhouse-as-an-analytic-extension-for-mysql?utm_campaign=Brand&utm_content=224583767&utm_medium=social&utm_source=linkedin&hss_channel=lcp-10955938 )
2022-10-21 13:26:32 +00:00
- [Altinity Sink connector for ClickHouse ](https://altinity.com/blog/fast-mysql-to-clickhouse-replication-announcing-the-altinity-sink-connector-for-clickhouse )