Last weekend, I was invited to share the introduction of "windmill" architecture at Hacker News Shanghai party and Ruby Shanghai event. I would like to thank all organizers and venue providers.
The windmill project started in November 2011, formerly known as pragmatic. Ly. From the first day, we have basically determined the general framework structure. Looking back today, there is basically no change in the whole framework, which can be regarded as a very mature and suitable scheme for the times, It is.
In the past two years, as technicians, we can clearly feel the rapid development of front-end technology, such as HTML5 support, mobile priority, responsive interface design and endless client frameworks. All of these are based on one thing: the rapid development of browsers. Chrome, Firefox, Safari, opera and even ie have developed rapidly in recent years. It's no exaggeration to say that these browsers are no longer browsers, but open platforms with their own extension mechanisms. These greatly changed the way of website development, and websites began to be applied.
This is the case with windmills, which are designed very close to desktop applications, such as the following features:
- Heavy client, all business logic is in the client, and the response is very fast
- Single page system, the operation in the project does not need to refresh the page, the operation is very smooth
- Three column layout, the left, middle and right columns perform their duties from left to right, and the information is very clear
- Real time update, any update in the project will be synchronized to your page in real time
Behind this design is its own technology stack.
Overview
- Spine.JS
- Rails
- Pusher
The windmill uses spine.js on the client side and Ruby on rails on the back end. Pusher is used for real-time message synchronization. (two of the three can't be opened for inexplicable reasons - (-)
Spine.js is a lightweight MVC JavaScript library, improved by Alex maccaw, author of javascript web applications, based on backbone.js. The spine library is written in coffeespcript, with an overall code volume of only about 1000 lines, which is much less than those frameworks such as angular.js and ember.js, and is very easy to learn and use.
At present, we are still using version 3.2 of rails, which is basically used as an API server, just for data, not for logic. Last week, some friends asked why they didn't choose lighter solutions such as Sinatra and grape because they only used API servers. First, in terms of development, rails is relatively comfortable to use by default. In addition to the API, we also have some integration and management functions of third-party applications, so it is more convenient to build the overall site. Second, we haven't encountered any major performance problems at present, so it's unnecessary to replace them. If the next stage is really needed, the restful API will be specifically independent.
Pusher is a real-time message push service based on websocket, which is very convenient to integrate into applications. Even in browsers that don't support websocket (yes, that's right, ie), there is also a default standby mode. You can choose flash socket or sockjs. For the overall experience, pusher is a very good solution, light and fast, which saves us a lot of development time and only needs to focus on the core value of the product. However, if your application is very strict with real-time requirements, such as trading system, the stability of pusher may not meet your requirements, because you know some network reasons.
When the browser refreshes the page, it will send a request to the server. After receiving the request, the server will return a pure HTML empty template without data. After the client renders the template, it requests the real data (JSON format) of the project from the server through the restful API again, and then the client processes and presents the data to get the page that the user really sees. After that, a long websocket connection will be established with the pusher server to receive push information. When the server has any update, it will send a message to the pusher server, and then the pusher server will transmit it to the client browser, and the page will be updated at the same time. The above is a simple process.
Client
As mentioned above, the front end of the windmill uses spine.js and jQuery. It's slightly different on the mobile side, which is spine.js mobile and zepto. At this time, I have calculated that the compressed JS and CSS, including all third-party libraries, have reached nearly 270 KB. Thank you for qiniu cloud storage. Because of their CDN service, these static files of windmill can be downloaded to the client quickly to speed up the loading of the page.
Spine.JS
Spine has been introduced a lot, and here is another feature I like very much: asynchronous interfaces. When we decide to move the logic from the server to the client, we need to improve the user's whole experience and respond to the user's behavior very quickly. Therefore, when the user does an operation to update the data, do not display a loading spinner, let the user wait for the data to be updated, but should immediately give the change of the page. This is asynchronous UI. By decoupling the client UI interaction and synchronizing with the server data, the interaction is smooth.
CoffeeScript + Eco
I really admire DHH here. At that time, it was hard to obstinately join coffeescript in rails 3 by default, which made coffeescript popular quickly. Coffee is a kind of addictive thing. Almost all of our JS code is written with coffee, and the pure JS code compiled at last is also very readable. Even the criticized debugging complexity is never a problem for people familiar with code structure and coffee, let alone the support of source maps. Eco is "embedded coffee script templates". The syntax is very similar to Erb. As a ruby developer, you can't help liking it.:)
Model-View-Controller
From the MVC framework, the controller layer is mainly responsible for receiving requests and processing them. Corresponding to the client, requests are events. Therefore, the controller is responsible for handling DOM events and router events. Based on this, there are two types of controllers in the front end of the windmill, one is dealing with the page Dom and the other is dealing with the routing. The controller based on DOM is designed according to the structure of the page. Each controller corresponds to a separate DOM module, for example, the sidebar in the windmill corresponds to a controller, and the task list part and team member part in the sidebar correspond to a controller, and so on. These DOM controllers monitor the occurrence of events on the DOM and the update of data corresponding to the dom. The router based controller is designed according to the URL to monitor the change of the page URL. For example, each task corresponds to a separate URL, so the click behavior will lead to the change of the URL. The change will be caught by the router controller, and the corresponding operation will be performed. The whole process has nothing to do with dom.
The model layer is responsible for all data related processing. Most of the time, the data corresponds to the URL, so the model layer only needs to interact with the router controller most of the time. After the router controller prepares the data from the model, an event will be triggered and the DOM controller will render the corresponding page.
In addition to the MVC three-tier, the windmill also uses many features of HTML5 in the design. In addition to the websocket introduced earlier, there are history, web notification, drag & drop and localstorage.
HTML5 History pushState
History is the implementation basis of router controller. We are all used to the browser page to visit the history page. Many rich client applications are difficult to support because the URL has not changed. HTML5 history is an implementation provided on the JavaScript side to solve this problem. When we visit the next page, we will push the URL to the stack. When we press back, we will pop the URL. The router controller needs to define a series of URLs to respond, so once matched, it will intercept the page Jump, execute the corresponding code instead, and render the corresponding data. Therefore, in the windmill, each page modification corresponds to a unique URL, so in addition to moving forward and backward, there is an additional advantage that after refreshing the page, you can always return to the URL before refreshing. Of course, you need to use the same set of routes on the client and the server.
HTML5 desktop notification
Web notification is a desktop notification. When a user related event occurs, for example, someone in the team assigns a task to you, or someone in the discussion @ you, they will receive a notification. At present, chrome and safari directly support web notification, and Firefox supports the latest version. The old version needs to install a plug-in support, while ie supports more than 10, and must be added to the pinned site list. For details, please refer to this article that I wrote in the official windmill blog to introduce HTML5 web notification in detail.
HTML5 drag and drop
There have been many JS implementations before drag & drop, such as the support of DND in jQuery UI, and the standardization of DND in HTML5 specification. Inside the windmill is the specification of HTML5, which is mainly used to drag tasks to a task list, a member and the same list.
Limited offline support
As mentioned above, there is a feature of spine that is asynchronous UI, but if there is a short-term problem in the connection between us and the server, such as the network is broken, it will be updated on the client, but the server is not synchronized, so as soon as the user refreshes, the data will be lost. In this case, if synchronization fails, we usually put the unsynchronized data into localstorage and try again every other time until synchronization succeeds. Therefore, it is not a problem to use windmill for a short time after the page is loaded, even if it is offline. However, due to the limited implementation and lack of version control system, in extreme cases, for example, when different people in the team update the same task, the later synchronized data will overwrite the first synchronized data. Because this rarely happens, we haven't spent much time to improve it at present.
Server side
The whole service end of the windmill is based on Ruby on rails, which has been introduced previously. My previous best practice of refactoring a rails project introduced some ways to write better code structures. In the windmill, we also add service layer and presenter layer in addition to the standard MVC three layers.
Service Layer
In order to keep the structure of the controller as simple as possible, we introduce a new layer between the controller and the model: the service layer for some complex request processing logic not only belonging to a certain model. For example, windmills currently integrate the hooks of GitHub, gitlab and bitbucket. When the user pushes the submission to the remote end, GitHub / bitbucket will send a request to the windmill's server containing the submitted information. When the windmill receives these requests, it first needs to make a feature judgment to find out which service it comes from. Then analyze and judge whether the push message is bound to the specific task. Finally, according to the message, judge whether to update the status and create a discussion. The logic of the whole process is complex and relatively independent, involving multiple models and having many different strategies, which is very suitable for abstracting into a service. Other scenarios include analytics service, password service, email handler service, etc.
Presenter Layer
Familiar with rails development, friends generally know that the view layer should only be used to display data, and we should avoid having logic in view. However, many times, there will inevitably be some views that are difficult to maintain. Last year, in rubyconf China 2013, xdite, a lecturer from Taiwan, introduced how to write a maintainable view. For details, see here. However, for windmills, a large number of views are on the client side, and few on the server side. They only provide some data preparation, so they don't use much skills. Just because data preparation involves data from multiple models and data from redis database, we have independently created presenter object to manage these logic instead of putting it in view.
Observer
Generally speaking, observer monitors data. When data is created, updated or deleted, it can perform some corresponding operations, such as sending registration email after user registration. But the observer in the windmill is slightly different. In addition to the monitoring data, we also monitor the controller to understand the request information corresponding to the data changes, such as who is the operating user. This article mainly uses the caching sweeper of rails for reference. We will write a special article to introduce it later.
Sidekiq
Sidekiq is a simple and powerful message queuing system, which is currently the first choice for background processing in the ruby world. There are other similar options, such as resque, delayed_job, etc., but the reason why sidekiq can quickly become the first choice is based on two characteristics: one is the parallel processing mechanism based on actor mode, the other is the PubSub model based on redis, so it can obtain the same processing power with less memory resources. In the windmill, we will put all operations that can be delayed into the background, such as sending notifications, data statistics, creating initial data, etc. In this way, we can make every request complete in the shortest time and improve the throughput of the whole system. Sidekiq will process the message in the background after receiving it. Even if it fails, it will try again, which is more reliable.
Percona
The main database of the windmill still uses percona, which is a branch of MySQL. However, it has better performance because it uses the xtradb storage engine developed by percona. Another advantage is that percona is the closest version to the official MySQL enterprise distribution, which is fully compatible with MySQL. I can easily switch without changing the code.
Redis
Redis is mainly used for two purposes in the windmill: 1. It is used as a caching store to store view cache and record cache; 2. It can speed up data access, store some data that will be read and written frequently in memory, and reduce access to percona database, such as uid mapping table and statistical information. The reason why redis is used instead of memcached is that first of all, redis data is persistent and won't be lost due to restart, because we have some data that can't be rebuilt immediately, such as the user's online status. Second, redis can be used to store some complex data structures, such as list and set, which are very suitable for statistics. This year, redis 3.0 is expected to be officially released. With the support of redis cluster, you can expect to bring better performance.
Expansibility
But the overall technical structure of the windmill is not complicated, but very practical. At present, we only use a single machine on the linode, because the business logic is mainly in the front end and the back end is mainly API, so the performance problem is not prominent. However, we have to admit that there is a lack of robustness. Once there is a failure in the background, such as database or redis, it will affect the normal operation of the service. Let's talk about the scalability of the background architecture. It comes from my previous work, but it has not been put into practice.
Rails
Rails has always been criticized for its performance. Recently, many reports can be seen that XX applications get YY times of performance after migrating from rails to node.js. In the last issue, teahour and park Ling also talked about that node.js uses event driven and non blocking to ensure high performance and reduce programmer errors from the framework. Ruby can achieve the same effect, but the requirements for programmers themselves will be higher. Several possible ways to expand.
- Rails application itself has a good level of expansion. At present, we use Unicorn as the application server and nginx as the front-end server. Unicorn is a high-performance HTTP server based on rack. Nginx, as the reverse proxy, communicates with Unicorn through UNIX socket or TCP protocol. Therefore, if the traffic increases in a short period of time, it is a way to expand the performance of the machine. You can also increase the throughput and processing capacity of the system by increasing the machine, adding load balancing in the front end, and adding the number of Unicorn processes in the back end.
- Extracting the API part from the rails application is an independent application, because there is basically only API communication between the front end and the back end. API part can use higher performance solutions, such as Goliath + graph in Ruby world. Goliath, like the concept of node.js, is an asynchronous non blocking high-performance HTTP server. Grape is a very lightweight API framework.
- Of course, you can also consider using jruby or node.js or go,: P
Percona
At present, there are not many convenient optimizations in the database, we are only optimizing the index and avoiding the join table query as much as possible. However, the application nature of windmill determines that it will not be big data, especially some of our data is not stored through percona. I'm ashamed that at present, windmills only do data backup, but they haven't done cluster, master-slave, read-write separation, but they will try when they encounter bottlenecks. Out of the relational database, maybe you can try some document databases such as Mongo, which should also be quite suitable.
Redis
Redis is used to store analysis data and save database access and calculation. Because redis does not have cluster implementation in version 2. X, currently, extension can only obtain limited support through consistency hash on the upper layer. The beta version of 3.0 has been released, and the built-in cluster function is still under test, which is worth looking forward to. In addition, for the master-slave replication of redis, there will be different problems for different application scenarios, such as the redis persistence strategy and the synchronization strategy between the master and slave. Redis has to be tuned for data above one million levels. At the same time, it is painful to restart every time. It takes a lot of time to rebuild the database. Therefore, it is possible to slice the data, and try to keep only the fresh data or common data in memory, and the old data can be stored on disk.
Sidekiq
Sidekiq itself is a thread based single process running mode, using redis as the message queue. Therefore, sidekiq's parallel ability is easy to improve, as long as there are several processes, provided that the database and redis can bear it, and of course, there are memory hardware resources.
Pusher
This is the advantage of using online services. Spend some money to leave performance problems to them, and the price is reasonable
epilogue
At present, I believe that this architecture will allow us to spend a lot of time and also facilitate horizontal expansion. For a technology driven team, this is our advantage and our attitude towards things. Thanks to all the open source software and online services used by windmills, we can make a "small" team have better time to focus on the core value of products and save time to do "big" things. As a team collaboration tool, windmills also want to help you work better. If you understand the value of time, you should use windmills to manage your projects. Windmill makes collaboration easier and more efficient.
Any ideas? Try it now!
Welcome to exchange! I can write (2), (3), etc. for some common problems.