How to make reasonable use of hardware resources, support data volume and usage scenarios according your data volume, ClickHouse is a OLAP database have reasonable architecture, and provide users with better experience of big data platform. By the help of ClickHouse you neither care for expansion of SSD storage space nor CPU and memory will become your bottleneck.
- Unfixed query conditions and unfixed summary dimensions.
- The amount of data is increasing, and the amount of data to be updated every day is also increasing.
- If you need accumulated reports for various business lines
- If data volume will become larger and larger and maintenance costs are getting higher and higher.
- The number of business scenarios continues to increase, with more coverage.
- High data compression ratio and low storage cost.
- Need to ensure high availability and output within seconds.
- The biggest feature of ClickHouse is fast, high data compression ratio, low storage cost, and by the help of clickhouse your hardware cost is at least 60% lower.
- It supports common SQL syntax, the writing speed is very fast, and it is suitable for a large number of data updates.
- Relying on sparse index, columnar storage, full utilization of CPU/memory creates excellent computing power, and does not need to consider the left side principle.
- The full amount of data synchronization is very simple
- The developer can see most of the data he needs by default, and the developer will no longer cut the query conditions
- More than 100,000 queries per day
Its syntax is similar to MySQL, but it has a feature that its join cannot be too complicated. When table A is joined to table B, table C cannot be directly joined. You need to join table A to table B as a temporary table with aliases. You will join the C table later, so its grammar will be more unique mainly in the join. If your query is very complex, your join will look very long, so the readability of the query is not as easy to understand as SQL. But its writing speed is very fast, and it is especially suitable for updating millions and billions of data every day like offline data. According to official information, the data is imported at a speed of 50–200 megabytes per second.
When you fetching data, you often only fetch certain fields. Column storage is more friendly to IO, reducing the number of IOs is also an aid to query speed. Furthermore, it uses the CPU to a large extent. We all know that MySQL obtains data in a single thread, but how many CPUs there are on the ClickHouse server, it will use half of the server’s CPU to pull, like the 40 cores or 32 you usually use. The core physical machine basically uses half of the core to pull data. Of course, this can modify how many CPUs each query uses in the configuration file.
Because ClickHouse is very friendly for MySQL data synchronization, it is similar to importing the data of a table in MySQL to the temp table.
When importing tens of millions of data into ClickHouse a process ID will be generated every time an insert is executed. If the execution is not completed, directly Rename will cause data loss, so there must be a job polling to see if the execution is finished here, and only after the process id of the insert is executed, the next series of renames are done.
With Help of ClickHouse cluster architecture
1. Data reading and load balanced through the application.
If computing power divided on two different location, ClickHouse supports mutual backup and load balance at the same time.
2. At least two machines in the virtual cluster and in different locations
If Each virtual cluster have at least one machine in the two location. In case one computer location has a network problem, the nearest located computer can support this query.
3. The data is independent, write more, and do not interfere with each other.
The ClickHouse reads the data on the two machines through the connection strings of different clusters. Simply put, you can divide it by our own business lines. For example, your data can be placed on different locations on different machines.
4. Flexible creation of different virtual clusters for appropriate occasions.
One advantage of this method is that you can make full use of the machine, especially when the data that you needs to pay attention to in daily work is different from the holidays. At this time, it is necessary to build some new clusters, which is more flexible.
5. Adjust the server at any time, add/reduce servers.
You can make full use of server resources, instead of applying for a machine because of a new scene, and then going online, the whole process will be relatively long.