clickhouse-sink-connector/README.md

135 lines
6.8 KiB
Markdown
Raw Normal View History

# Altinity Sink Connector for ClickHouse
2022-04-01 12:00:03 +00:00
2022-10-17 18:22:59 +00:00
Sink connector is used to transfer data from Kafka to Clickhouse using the Kafka connect framework.
The connector is tested with the following converters
- JsonConverter
2022-08-09 22:29:25 +00:00
- AvroConverter (Using [Apicurio Schema Registry](https://www.apicur.io/registry/) or Confluent Schema Registry)
![](doc/img/sink_connector_mysql_architecture.jpg)
# Features
2022-06-29 19:48:01 +00:00
- Inserts, Updates and Deletes using ReplacingMergeTree/CollapsingMergeTree - [Updates/Deletes](doc/mutable_data.md)
- Auto create tables in ClickHouse
- Exactly once semantics
2022-06-29 19:48:01 +00:00
- Bulk insert to Clickhouse.
- Store Kafka metadata [Kafka Metadata](doc/Kafka_metadata.md)
2022-06-29 19:48:01 +00:00
- Kafka topic to ClickHouse table mapping, use case where MySQL table can be mapped to a different CH table name.
- Store raw data in JSON(For Auditing purposes)
- Monitoring(Using Grafana/Prometheus) Dashboard to monitor lag.
- Kafka Offset management in ClickHouse
- Increased Parallelism(Customize thread pool for JDBC connections)
2022-04-01 12:00:03 +00:00
# Source Databases
- MySQL (Debezium)
- PostgreSQL (Debezium) (Testing in progress)
2022-10-17 18:22:59 +00:00
| Component | Version(Tested) |
|---------------|-------------------|
| Redpanda | 22.1.3 |
| Kafka-connect | 1.9.5.Final |
| Debezium | 1.9.5.Final |
| MySQL | 8.0 |
| ClickHouse | 22.9 |
2022-10-17 18:22:59 +00:00
### Quick Start (Docker-compose)
Docker image for Sink connector
`altinity/clickhouse-sink-connector:latest`
https://hub.docker.com/r/altinity/clickhouse-sink-connector
Recommended Memory limits.
```
JAVA_OPTS="-Xms1G -Xmx5G"
```
### Kubernetes
Docker Image for Sink connector(with Strimzi)
https://hub.docker.com/repository/docker/subkanthi/clickhouse-kafka-sink-connector-strimzi
Docker Image for Debezium MySQL connector(with Strimzi)
https://hub.docker.com/repository/docker/subkanthi/debezium-mysql-source-connector
Recommended to atleast set 5Gi as memory limits to run on kubernetes using strimzi.
``` resources:
limits:
memory: 6Gi
requests:
memory: 6Gi
```
2022-10-17 18:22:59 +00:00
#### MySQL:
```bash
cd deploy/docker
./start-docker-compose.sh
```
2022-10-17 18:22:59 +00:00
For Detailed setup instructions - [Setup](doc/setup.md)
## Development:
Requirements
- Java JDK 11 (https://openjdk.java.net/projects/jdk/11/)
- Maven (mvn) (https://maven.apache.org/download.cgi)
- Docker and Docker-compose
```
mvn install -DskipTests=true
```
2022-10-21 13:26:52 +00:00
## Data Types
2022-10-17 18:22:59 +00:00
| MySQL | Kafka<br>Connect | ClickHouse |
|--------------------|------------------------------------------------------|---------------------------------|
| Bigint | INT64\_SCHEMA | Int64 |
| Bigint Unsigned | INT64\_SCHEMA | UInt64 |
| Blob | | String + hex |
| Char | String | String / LowCardinality(String) |
| Date | Schema: INT64<br>Name:<br>debezium.Date | Date(6) |
| DateTime(6) | Schema: INT64<br>Name: debezium.Timestamp | DateTime64(6) |
| Decimal(30,12) | Schema: Bytes<br>Name:<br>kafka.connect.data.Decimal | Decimal(30,12) |
| Double | | Float64 |
| Int | INT32 | Int32 |
| Int Unsigned | INT64 | UInt32 |
| Longblob | | String + hex |
| Mediumblob | | String + hex |
| Mediumint | INT32 | Int32 |
| Mediumint Unsigned | INT32 | UInt32 |
| Smallint | INT16 | Int16 |
| Smallint Unsigned | INT32 | UInt16 |
| Text | String | String |
| Time | | String |
| Time(6) | | String |
| Timestamp | | DateTime64 |
| Tinyint | INT16 | Int8 |
| Tinyint Unsigned | INT16 | UInt8 |
| varbinary(\*) | | String + hex |
| varchar(\*) | | String |
| JSON | | String |
| BYTES | BYTES, io.debezium.bits | String |
| YEAR | INT32 | INT32 |
| GEOMETRY | Binary of WKB | String |
## ClickHouse Loader(Load Data from MySQL to CH for Initial Load)
2022-10-17 21:39:14 +00:00
[Clickhouse Loader](python/README.md) is a program that loads data dumped in MySQL into a CH database compatible the sink connector (ReplacingMergeTree with virtual columns _version and _sign)
2022-10-17 18:22:59 +00:00
### Grafana Dashboard
![](doc/img/Grafana_dashboard.png)
![](doc/img/Grafana_dashboard_2.png)
## Documentation
- [Architecture](doc/architecture.md)
- [Local Setup - Docker Compose](doc/setup.md)
2022-08-29 12:06:40 +00:00
- [Debezium Setup](doc/debezium_setup.md)
- [Kubernetes Setup](doc/k8s_pipeline_setup.md)
- [Sink Configuration](doc/sink_configuration.md)
- [Testing](doc/TESTING.md)
2022-08-09 22:29:25 +00:00
- [Performance Benchmarking](doc/Performance.md)
- [Confluent Schema Registry(REST API)](doc/schema_registry.md)
2022-10-17 18:22:59 +00:00
2022-10-19 16:51:21 +00:00
## Blog articles
- [ClickHouse as an analytic extension for MySQL](https://altinity.com/blog/using-clickhouse-as-an-analytic-extension-for-mysql?utm_campaign=Brand&utm_content=224583767&utm_medium=social&utm_source=linkedin&hss_channel=lcp-10955938)
2022-10-21 13:26:32 +00:00
- [Altinity Sink connector for ClickHouse](https://altinity.com/blog/fast-mysql-to-clickhouse-replication-announcing-the-altinity-sink-connector-for-clickhouse)