Before that, I briefly introduced the architecture of various famous large-scale websites, including MySpace's five milestones, Flickr's architecture, YouTube's architecture, plentyoffish's architecture and Wikipedia's architecture. These are very typical. We can get a lot of knowledge about website architecture from them. After reading them, you will find that your original idea is likely to be narrow.
Today, let's talk about how a website generally constructs a system architecture step by step. Although we hope that the website can have a good architecture at the beginning, Marx tells us that things are moving forward in the development. The website architecture is also improving with the expansion of business and the needs of users. The following is the basic process of the gradual development of a website architecture Cheng, after reading it, please think about what stage you are in now.
The first step of architecture evolution: physical separation of webserver and database
At the beginning, because of some ideas, a website was built on the Internet. At this time, it is possible that the hosts are rented. However, we only focus on the evolution of the architecture in this article, so we assume that at this time, a host has been hosted and has a certain bandwidth. At this time, because the website has certain characteristics and attracts some people to visit, gradually you find that the pressure of the system is getting higher and higher, and the response speed is getting slower and slower. At this time, it is quite obvious that the database and the application affect each other, the application has problems, the database is also easy to have problems, and the application is also easy to have problems when the database has problems. So we entered the first stage of evolution: we separated the application and the database from the physical aspect and turned them into two machines. At this time, there is no new requirement in technology, but you find that it really works, and the system recovers to the previous response speed, and supports higher traffic, and will not affect each other due to the database and the application.
See the figure of the system after this step:
The second step of architecture evolution: increase page cache
Good times don't last long. As more and more people visit, you find that the response speed starts to slow down again. Find out the reason and find that there are too many operations to access the database, which leads to fierce competition in data connection, so the response becomes slow. But the database connection can not be opened too much, otherwise the pressure of the database machine will be very high, so the cache mechanism is considered to reduce the competition of database connection resources and the pressure on database reading. At this time, you may choose to use squid and other similar mechanisms to cache the relatively static pages in the system (for example, the updated pages will only be available in a day or two) (of course, you can also use the scheme of page static). In this way, the program can not be modified, which can reduce the pressure on webserver and reduce the competition of database connection resources Is to start using squid to do relatively static page caching.
See the figure of the system after this step:
This step involves these knowledge systems:
Front end page cache technology, such as squid, if you want to use it well, you need to master the implementation of squid and cache failure algorithm.
The third step of architecture evolution: increase page fragment cache
After adding squid as cache, the speed of the whole system has indeed increased, and the pressure of webserver has also begun to decline. However, with the increase of the number of visits, it is found that the system has started to slow down again. After tasting the benefits of dynamic caching such as squid, I began to wonder if I could cache the relatively static parts of the dynamic pages. So I considered using page fragment caching strategies such as ESI. OK, I began to use ESI to cache the relatively static parts of the dynamic pages.
See the figure of the system after this step:
This step involves these knowledge systems:
Page fragment caching technology, such as ESI, needs to master the implementation of ESI if you want to use it well;
The fourth step of architecture evolution: data caching
After using ESI and other technologies to improve the caching effect of the system again, the pressure of the system does further reduce, but also, with the increase of the number of visits, the system starts to slow down. After searching, it may be found that there are some places in the system that repeatedly obtain data information, such as obtaining user information. At this time, we start to consider whether we can also cache these data information. Then we cache these data into local memory. After the change, it is fully in line with expectations. The response speed of the system is restored again, and the pressure of the database is reduced again Quite a few.
See the figure of the system after this step:
This step involves these knowledge systems:
Cache technology, including map data structure, cache algorithm, implementation mechanism of the selected framework itself, etc.
The fifth step of architecture evolution: add webserver
It's not a long time. It's found that with the increase of system access again, the pressure of web server machine will rise to a higher level in the peak period. At this time, we start to consider adding a web server, which is also to solve the problem of availability and avoid a single web server The down machine can't be used. After these considerations, when we decide to add a web server and a web server, we will encounter some problems. The typical ones are: 1. How to allocate access to these two machines? At this time, we usually consider the load balancing scheme of Apache or LVS; 2. How to protect Synchronization of state information, such as user session, will be considered at this time, including mechanisms such as writing database, writing storage, cookie or synchronizing session information, etc.; 3. How to keep the synchronization of data cache information, such as previously cached user data, etc. at this time, the mechanisms usually considered are cache synchronization or distributed cache; 4. How to make upload These similar functions of files continue to be normal. At this time, the mechanism usually considered is to use shared file system or storage. After solving these problems, the web server is finally increased to two, and the system is finally restored to the previous speed.
See the figure of the system after this step:
This step involves these knowledge systems:
Load balancing technology (including but not limited to hardware load balancing, software load balancing, load algorithm, Linux forwarding protocol, implementation details of selected technology, etc.), master-slave Technology (including but not limited to ARP spoofing, Linux heart beat, etc.), state information or cache synchronization technology (including but not limited to cookie technology, UDP protocol, state information broadcast, selected Implementation details of cache synchronization technology, etc.), shared file technology (including but not limited to NFS, etc.), storage technology (including but not limited to storage devices, etc.).
The sixth step of architecture evolution: sub base
After enjoying the happiness of high-speed growth of system visits for a period of time, I found that the system began to slow down again. What's the situation this time? After searching, I found that the resource competition of some database connections of database write and update operations is very fierce, which led to the slow down of the system. What should I do? At this time, the optional schemes are database cluster and sub database strategy. In the aspect of cluster, like some databases, it is not very well supported. Therefore, sub database will become a common strategy. Sub database also means to modify the original program. After implementing sub database through modification, it is good, the target is achieved, and the system recovery speed is even faster than before.
See the figure of the system after this step:
This step involves these knowledge systems:
In this step, we need to make a reasonable division from the business to realize the sub base, and there are no other requirements on the specific technical details;
But at the same time, with the increase of the amount of data and the progress of sub database, we need to do better in the design, optimization and maintenance of database, so we still put forward high requirements for these aspects of technology.
The seventh step of architecture evolution: split table, DAL and distributed cache
With the continuous operation of the system, the amount of data began to increase substantially. At this time, it was found that the query would still be slow after the sub database, so the sub table work began according to the idea of sub database. Of course, this will inevitably require some changes to the program. Perhaps at this time, it will be found that the application itself needs to care about the rules of the database and table, etc., which is still somewhat complex. So, can we add a general framework to realize the data access of database and table? This corresponds to DAL in eBay's architecture. This evolution process will take a relatively long time. Of course, it is also possible that the general framework will not start until the sub table is completed. At the same time, in this stage, we may find that the previous cache synchronization scheme has problems, because the data volume is too large, so it is unlikely that the cache will exist locally now, and then the way of synchronization needs to adopt the distributed cache scheme. So, after a lot of investigation and torture, a large amount of data cache is finally transferred to the distributed cache.
See the figure of the system after this step:
This step involves these knowledge systems:
The division of tables is more about the division of services, including dynamic hash algorithm and consistenthash algorithm;
Dal involves many complex technologies, such as management of database connection (timeout, exception), control of database operation (timeout, exception), encapsulation of sub database and sub table rules, etc;
Step 8 of architecture evolution: add more webservers
After finishing the work of sub database and sub table, the pressure on the database has been reduced to a low level, and we are living a happy life of watching the number of visits increase dramatically every day. Suddenly one day, I found that the access of the system began to slow down again. At this time, I first checked the database and found that the pressure was all normal. Then I checked the webserver and found that Apache blocked many requests, and the application server was also relatively fast for each request. It seems that the number of requests was too high, which led to the need to wait in line and the response speed became slow. This is easy to do. Generally speaking, there will be some money at this time, so add some webserver servers. In the process of adding webserver servers, there may be several challenges:
1. Apache's software load or LVS's software load can't undertake the scheduling of huge web access (request connections, network traffic, etc.). At this time, if the funds allow, the scheme will be to buy Hardware load balancing equipment, such as F5, netscrar, athelon, etc. if the funds don't allow, the scheme will be to logically do certain application Classification, and then distributed to different soft load clusters;
2. Some original schemes such as state information synchronization and file sharing may have bottlenecks and need to be improved. Perhaps at this time, a distributed file system that meets the business requirements of the website will be written according to the situation;
After finishing these works, we begin to enter a seemingly perfect era of infinite scalability. When the website traffic increases, the solution is to continuously add Web server.
See the figure of the system after this step:
This step involves these knowledge systems:
At this stage, with the continuous growth of the number of machines, the continuous growth of the amount of data and the increasing demand for system availability, it is required to have a deeper understanding of the technology used at this time, and to make more customized products according to the needs of the website.
The ninth step of architecture evolution: data read-write separation and cheap storage scheme
Suddenly one day, the perfect era is coming to an end, and the nightmare of database appears again. Because there are too many web servers added, the resources of database connection are still insufficient. At this time, the database has been divided into databases and tables. When analyzing the pressure of the database, you may find that the read-write ratio of the database is very high. At this time, you usually think of the scheme of data read-write separation. Of course, it's not easy to implement this scheme. In addition, it may be found that some data stored in the database is wasted, or it takes up too much database resources. Therefore, the architecture evolution that may be formed at this stage is to realize data read-write separation, and write some cheaper storage schemes, such as BigTable.
See the figure of the system after this step:
This step involves these knowledge systems:
Data read-write separation requires a deep understanding of database replication, standby and other strategies, as well as a self implemented technology;
The low-cost storage scheme requires a deep understanding of the file storage of OS, and a deep understanding of the implementation of the adopted language in the file area.
Step 10 of architecture evolution: enter the era of large-scale distributed application and the era of low-cost server group dream
After the above long and painful process, we finally ushered in a perfect era again. With the continuous increase of web server, we can support more and more visits. For large-scale websites, there is no doubt about the importance of popularity. With the growing popularity, a variety of functional requirements are also starting to explode. At this time, I suddenly found that the original web application deployed on the web server is very large. When multiple teams start to make changes to it, it's really inconvenient, and the reusability is also very bad. Basically, each team has done more or less repeated things, and the deployment and maintenance are also quite troublesome. Because it takes a lot of time to copy and start a huge application package on N machines, and it's not easy to check when there's a problem. Another worse situation is that there's likely to be a bug in an application As a result, the whole station is unavailable, and there are other factors such as poor operation of tuning (because the application deployed on the machine has to do everything, and it is impossible to conduct targeted tuning at all). According to such analysis, we start to make up our minds and split the system according to their responsibilities, so a large-scale distributed application is born. Usually, this step takes quite a long time Because there will be many challenges:
1. A high-performance and stable communication framework needs to be provided after splitting into distributed systems, and various communication and remote call modes need to be supported; 2. It takes a long time to split a huge application, which requires business organization and system dependency control; 3. How to operate and maintain (dependency management, health management, error recovery, etc.) Tracking, tuning, monitoring and alarming) good at this huge distributed application. After this step, almost the architecture of the system has entered a relatively stable stage. At the same time, it can start to use a large number of cheap machines to support the huge amount of access and data. Combined with this architecture and the experience of so many evolution processes, it can use other various methods to support the increasing amount of access.
See the figure of the system after this step:
This step involves these knowledge systems:
There are many knowledge systems involved in this step, which requires a deep understanding and mastery of communication, remote call, message mechanism, etc., and a clear understanding of the theory, hardware level, operating system level and the implementation of the adopted language.
Finally, a large website structure is attached: