- background
background
With the rapid growth of Fengchao business system, the data volume of its core system has already exceeded billion levels, and the annual increment is still developing rapidly. As the pressure of data volume increases, not only the system architecture complexity increases sharply, but also the data architecture becomes more complex. The traditional single node database has gradually failed to meet the needs of Fengchao. When the number of single tables is more than 100 million, Oracle can barely resist it, while MySQL is difficult to support when it reaches the level of 10 million, so it needs to separate tables and databases. For this reason, a high-performance distributed database is becoming an urgent need.
- Reflection
Reflection
After the increase of business volume of Internet companies, parallel expansion is the most common, simple and real-time means. For example, the load balancing equipment can break down the traffic, make the massive traffic become a small amount of traffic that each machine can bear, and support the whole business through clustering and other ways. So when the database can not afford to split.
But stateful data is different from stateless data. When data is split, data partition will occur, and the whole system will be in high availability state, so data consistency becomes a victim. A large number of checking tools run between systems to ensure the final consistency. In terms of business, the business students often meet the students who have shared the database and say that they can't do this demand and that they can't do that demand. If the business students who have SQL experience may have questions, it's not just a matter of SQL, in fact, it's the sequelae of sub database and sub table.
Therefore, we need a database to help us solve the above problems. Its features should be:
- Strong data consistency: support complete acid
Strong data consistency: support complete acid
- No matter how much data we insert, we just don't need to worry about when to expand the capacity. Will there be a bottleneck
No matter how much data we insert, we just don't need to worry about when to expand the capacity. Will there be a bottleneck
- High availability of data: when a few machine disks or other parts of a database are hung, our business can be senseless, and even when a disaster occurs in a computer room in a city, we can continue to provide services without losing data.
High availability of data: when a few machine disks or other parts of a database are hung, our business can be imperceptible, and even in case of a disaster in a city's computer room, we can continue to provide services without losing data.
- Complex SQL function: basically, single database SQL can be run on this database without modification or a little modification
Complex SQL function: basically, single database SQL can be run on this database without modification or a little modification
- High performance: while meeting the high QPS, low delay is guaranteed.
High performance: it can meet the requirements of high QPS and guarantee relatively low delay.
- model selection
model selection
Based on the analysis of the above expectations, we analyzed the newsql distributed database currently on the market. The list is as follows:
After considering the open source protocol, maturity, controllability, performance, service support and other comprehensive factors, we chose tidb. Its main advantages are as follows:
- Highly compatible with MySQL
Highly compatible with MySQL
In most cases, you can easily migrate from Mysql to tidb without modifying the code. MySQL clusters after database and table splitting can also be migrated in real time through tidb tools.
- By simply adding new nodes, the horizontal elastic expansion can realize the horizontal expansion of tidb, expand throughput or storage on demand, and easily cope with high concurrent and massive data scenarios.
Horizontal elastic expansion
Simply adding new nodes can realize the horizontal expansion of tidb, expand throughput or storage on demand, and easily cope with high concurrency and massive data scenarios.
- Distributed transaction
Distributed transaction
Tidb 100% supports standard acid transactions
- High availability at financial level
High availability at financial level
Compared with the traditional master-slave (M-S) replication scheme, the majority election protocol based on raft can provide 100% strong data consistency guarantee at the financial level, and can realize auto failover without manual intervention without losing most copies.
Based on the above reasons, we chose tidb as the distributed database of Fengchao's core system to replace Oracle and mysql.
- Assessment
Assessment
1. Performance test
For the benchmark test of tidb, sysbanch is used for testing. Eight tables with basic data of 10 million are used. Insert, select, OLTP and delete scripts are tested respectively to get the data as follows. The QPS of query reaches an amazing 140000 seconds, and the insertion is stable at 140000 seconds.
Core server configuration
test result
Through ~
2. Function test
Through ~
- Access
Access
Because it's the core system, we have adopted a variety of schemes to ensure the reliability of the access of the verification project and ensure that the business will not be affected
1. Project selection
When looking for the first access project, we selected it with the following four characteristics
Finally, we chose the push service. Because push service is the core service used by Fengchao to send pick-up notifications, which is very large in volume, but simple in logic, and has alternative external push schemes, even if there is a problem, it will not affect users.
2. Code modification
Because tidb is fully compatible with MySQL syntax, our code changes are very subtle in the access process of this project. SQL is basically zero change, mainly peripheral code, including:
- Asynchronous interface modification and asynchronous data warehousing.
Asynchronous interface modification and asynchronous data warehousing.
- Synchronous interface modification to achieve abnormal fusing.
Synchronous interface modification to achieve abnormal fusing.
- Stop embedded data migration code.
Stop embedded data migration code.
The above three points ensure that the whole system is not strongly dependent on the database, and can protect the database from being crushed by asynchronous database dropping in the case of high concurrency. In addition, when the database has problems, the core business can go on normally.
- Effect
Effect
1. Query ability
After accessing tidb, a dozen sub tables originally split according to time dimension have become a large table. The most obvious change is that under the condition of large amount of data, the data query ability has improved significantly.
2. Monitoring capability
Tidb has a very complete monitoring platform, which can visually see the capacity and node status.
You can also understand the load of each node and the delay of SQL execution
Of course, you can also know the location of the machine, the CPU memory and other load conditions
The network status can also be clearly monitored
All of these enable the team to analyze the SQL in question and the database itself.
- Summary
Summary
The access process of tidb is very smooth as a whole. Due to a lot of previous access guarantee work, the process of switching traffic to tidb took only 10 minutes on the same day. I also want to thank tidb for its support for MySQL syntax compatibility and various useful tools provided by pingcap. So far, the system has been running stably for more than one month, which meets the business requirements of Fengchao well.
After the completion of tidb transformation, Fengchao push service has landed and queried most of the messages. Up to now, the largest number of sunset areas of push service has reached 50 million. If the push service still uses MySQL solution, it needs various database and table splitting schemes. Many detailed businesses cannot or are difficult to carry out.
The transformation of tidb is just a small step for Fengchao to explore distributed data technology. In the future, Fengchao will introduce more distributed technology into more business systems and create more extreme products and services.