IMCAFS

Home

7 days to build a front-end performance monitoring system

Posted by santillano at 2020-03-26
all

Introduction

A few days ago, when we walked into the famous enterprise Baidu Fex special field of w3ctech, we said that we could build our own front-end performance monitoring system seven days after the lecture, since we can't break our promise. I believe you have a certain understanding of the beauty of front-end data in the previous article. Here is a detailed description of the performance data and its monitoring.

Start action

In this paper, the performance mainly refers to the web page loading performance. Do you know the performance yet? Don't worry, next "every day" with me to enter the world of front-end performance.

Why does day 1 monitor performance?

“If you cannot measure it, you cannot improve it” ———— William Thomson

This is a basic question. Why should we pay attention to and monitor the front-end performance? For companies, performance is directly related to benefits to some extent. There are a lot of research data in this field abroad:

Data source: http://www.slideshare.net/bitcurrent/impact-of-web-latency-on-conversion-rates http://stevesouders.com/docs/jsdayit-20110511.pptx

Why does performance impact company revenue? The root cause is that performance affects the user experience. The delay of loading and the jam of operation will affect the user's experience. Especially in the mobile end, users have low tolerance for page response delay and connection interruption. Imagine that you open a web page with your mobile phone and want to see a certain message but load it for half a day. You are likely to choose to leave directly to change a web page. Google also takes page loading speed as a weight of SEO. There are many researches on the impact of page loading speed on user experience and SEO.

Although performance is very important, it is inevitable to be ignored in the development iteration process, and performance will decline with the product iteration. Especially in the mobile end, the network has always been a big bottleneck, but the page is growing larger and more complex. There are no simple golden rules to solve the performance optimization. We need a set of performance monitoring system to continuously monitor, evaluate, alert the performance of the page, find bottlenecks, and guide the optimization.

What tools are available for day 2?

If you want to do good work, you must first sharpen your tools

There are many mature and excellent tools for page performance evaluation and monitoring, and reasonable use of existing tools can achieve twice the result with half the effort. Here are a few common tools:

Page Speed

Page speed is a tool developed by Google to analyze and optimize web pages. It can be used as a browser plug-in. The tool detects the website based on a series of optimization rules, and gives detailed suggestions for the failed rules. Similar tools include Yslow. It is recommended to use gtmetrix website to view the results of multiple analysis tools at the same time, as shown in the following figure:

WebPagetest

Webpagetest is a very excellent front-end performance testing tool for web pages, which has been open-source. You can use the online version or build it yourself. There are also performance testing platforms built by webpagetest in China, and Alibaba test is recommended (the following example uses Alibaba test for testing).

With web page test, you can master the waterfall flow, performance score, element distribution, view analysis and other data in the loading process of the website in detail. The intuitive view analysis function can directly see the screenshots of each stage of page loading:

Note: Please click here for the whole test result

The figure above intuitively shows two important time points of browsing websites: white screen time and first screen time, i.e. how long users can see the content in the page and how long the first screen rendering is completed (including the loading of elements such as pictures). These two time points directly determine how long users need to wait to see the information they want to see. Google's optimization proposal also refers to reducing the use of CSS and JS for non first screen, so as to make the first screen appear as soon as possible.

PhantomJS

Phantom JS easily brings monitoring into the ranks of automation. Phantom JS is a web kit of server-side JavaScript API, based on which web automatic testing can be easily realized. Phantom JS requires some programming work, but it is also more flexible. In the official document, there has been a complete example of getting the web page to load the har file. For specific instructions, you can view this document, and there are many introductions about this tool in China. In addition, berserkjs, a similar tool developed by Sina @ tapir eating moo incense, is also very good. It also provides the first screen statistics function. For specific articles, please see here.

Day 3 starts online real user performance monitoring

Take advantage of their advantages and avoid their disadvantages

I'm sure that some students here ask, since there are so many excellent tools, why monitor the real access performance of online users?

We find that the tool simulation test will deviate from the real situation to some extent, sometimes it can not reflect the fluctuation of performance. In addition to basic indicators such as white screen and first screen, the product line also focuses on product related indicators, such as advertisement visibility, search availability, check-in availability, etc. these functions are directly related to page JS loading, which is difficult to simulate through tools.

In order to continuously monitor the user's access and the function availability of the page in different network environments, we choose to implant JS into the page to monitor the online real user's access performance. At the same time, we use the existing analysis tools as assistance to form a set of complete and diversified data monitoring system to provide reliable data for the evaluation and optimization of the product line.

The simple comparison of different monitoring methods can be seen in the following table:

How does day 4 collect performance data?

Monitoring users' pain points

What indicators are monitored online? How to better reflect user perception?

For users, why can't the page be opened, why can't the button be clicked, and why the picture display is so slow. For engineers, they may focus on the indicators of browser loading process such as DNS query, TCP connection and service response. According to the user's pain points, we extract four key indicators from the browser loading process, namely, white screen time, first screen time, user operable, and total download time (see the previous article for the definition). How are these indicators counted?

Determine statistical starting point

We need to start the statistics when the user enters the URL or clicks the link, because this can measure the waiting time of the user. If you have a high proportion of high-end browsers, you can directly use the navigation timing interface to get the statistical starting point and the time-consuming of each stage in the loading process. In addition, it can also be counted by the way of cookie recording time stamp. It should be noted that cookie can only count the data jumping in the station.

Count white screen time

White screen time is the time when the user first sees the content, also known as the first rendering time. The higher version of chrome has the firstpainttime interface to obtain this time, but most browsers do not support it, so other methods must be found to monitor it. Through careful observation of the web page test view, it is found that the white screen time appears near the end of the head chain resource loading, because the browser will not render the page until the head resource is loaded and parsed. Based on this, we can get the time when the head resource is loaded to approximate the white screen time. Although it is not accurate, the main factors that affect the white screen are considered: first byte time and head resource loading time.

How to count the load of header resources? We found that the JS embedded in the header usually needs to wait until the previous JS \ CSS is loaded. Can we add a sentence JS at the bottom of the browser head to count the loading end point of the header resource? You can test this with a simple example:

<!DOCTYPE HTML> <html> <head> <meta charset="UTF-8"/> <script> var start_time = +new Date; //测试时间起点,实际统计起点为 DNS 查询 </script> <!-- 3s 后这个 js 才会返回 --> <script src="script.php"></script> <script> var end_time = +new Date; //时间终点 var headtime = end_time - start_time; //头部资源加载时间 console.log(headtime); </script> </head> <body> <p>在头部资源加载完之前页面将是白屏</p> <p>script.php 被模拟设置 3s 后返回,head 底部内嵌 JS 等待前面 js 返回后才执行</p> <p>script.php 替换成一个执行长时间循环的 js 效果也一样</p> </body> </html>

After testing, it is found that the statistical header load time is exactly the same as the header resource download time, and a JS with a long execution time will not be counted until the JS is completed. It shows that this method is feasible (for specific reasons, please refer to the browser rendering principle and the introduction of JS single thread).

Statistics of the first screen time

The statistics of the first screen time is complex because it involves many elements such as pictures and asynchronous rendering. Observe the loading view and find that the loading of pictures is the main factor affecting the first screen. The completion time of the first screen rendering can be obtained by counting the loading time of the pictures in the first screen. The statistical process is as follows:

首屏位置调用 API 开始统计 -> 绑定首屏内所有图片的 load 事件 -> 页面加载完后判断图片是否在首屏内,找出加载最慢的一张 -> 首屏时间

This is a simple statistical logic under synchronous loading. In addition, we need to pay attention to the following points:

Statistics user actionable and total Downloads

User actionable by default, domready time can be counted, because event operations are usually bound at this time. For JS with modular asynchronous loading, the loading time of important JS can be marked actively in the code, which is also the statistical method of product indicators.

By default, the total download time can be used to count the onload time, which can count the time taken to load all resources synchronously. If there are many asynchronous rendering in the page, you can take the time when the asynchronous rendering is completed as the total download time.

Network indicators

Network type judgment

For the mobile terminal, the network is the biggest factor that affects the page loading speed, so we need to take corresponding optimization measures according to different networks, such as using simple version for 2G users, etc. But there is no interface on the web to get the user's network type. In order to obtain the type of user network, we can judge the corresponding network of different IP segments by speed measurement. Speed measurement, for example, is a classic solution with Facebook. After the analysis of speed measurement, the user's loading rate has an obvious distribution range, as shown in the following figure:

Each distribution interval corresponds to different network types. After the auxiliary test with the client, the success rate can be more than 95%. With the corresponding rate data of the IP library, the user network type can be determined according to the IP when analyzing the user data.

Network time consumption statistics

Network time-consuming data can be obtained through the navigation timing interface mentioned earlier. Similarly, resource timing can obtain the loading time of all static resources on the page. Through this interface, you can easily obtain DNS, TCP, first byte, HTML transmission and other time-consuming. The interface diagram of navigation timing is as follows:

The above focuses on the data collection part, which is also the most critical part of the system. Only by ensuring that the data can truly reflect the user's perception can the user experience be improved. After data collection, we can report uniformly after the page is loaded, as shown in the example:

http://xxx.baidu.com/tj.gif?dns=100&ct=210&st=300&tt=703&c_dnslookup=0&c_connecting=0&c_waiting=296&c_receiving=403&c_fetch_dns=0&c_nav_dns=75&c_nav_fetch=75&drt=1423&drt_end=1436&lt=3410&c_nfpt=619&nav_type=0&redirect_count=0&_screen=1366*768|1366*728&product_id=10&page_id=200&_t=1399822334414

How does day 5 analyze performance data?

Make data speak

The data analysis process, as described in the previous article, can analyze data from multiple dimensions. Big data processing needs to use Hadoop, hive and other methods, while for ordinary sites, any back-end language can be used.

Mean and distribution

Mean and distribution are the two most common methods in data processing. Because it can show the trend and distribution of indicators intuitively, it is convenient for evaluation, bottleneck discovery and alarm. Outliers should be removed during processing, such as dirty data that obviously exceeds the threshold.

In the time-consuming evaluation, there are a lot of research data in this area. For example, three basic time frames were proposed:

According to some research in the industry, combined with the characteristics of different indicators, we formulated the distribution evaluation interval of indicators. As shown in the figure below:

The development of evaluation interval is convenient for us to understand the current performance situation and respond to the fluctuation of performance trend.

Multidimensional analysis

In order to mine the possible bottleneck of performance, we need to analyze the data from a multidimensional perspective. For example, the most important dimension of the mobile terminal is the network. In data processing, in addition to the overall data, the data needs to be analyzed according to the network type. Common dimensions include systems, browsers, regional operators, etc. We can also determine some dimensions according to the characteristics of our products, such as page length distribution, simple version dazzle version, etc.

It should be noted that the more dimensions are, the better. It needs to be determined according to the characteristics of the product and the terminal. Dimension is for the convenience of finding performance bottlenecks.

Episode: I saw that someone commented that they wanted to do monitoring but the company didn't have a log server. There is no need for a separate log server, as long as the statistics of the request access log can be saved. If there is no solution for the website's own independent server, create a new application in Baidu Developer Center, write a simple web service to store the received statistical data analysis into Baidu cloud's free database, and then use Mysql to process the data of the day every day. There should be no problem for the sampling performance data of ordinary sites. Please call me Lei Feng.

How can day 6 use monitoring data to solve problems?

Find the bottleneck and prescribe the right medicine

For chart making, the famous one is highcharts, and the one developed by Baidu is also very good. No matter what tool is used, the most important point is to make the report highlighted and clear.

Before making the report, ask more questions about how to make people directly see the current situation and possible problems, what can be strengthened, what can be removed, and whether they are used to it.

With the real world that can reflect the user's perception, and subdivided into various business functions, detailed network and other auxiliary data, we are more handy in solving the front-end performance. The monitoring system has had continuous feedback on the online access status. According to the existing evaluation and bottleneck selection, the corresponding scheme is optimized, and finally adjusted according to the feedback. It is believed that the performance optimization is no longer a problem.

How to choose the optimization scheme? This is a big topic again. Fortunately, there are many experiences that can be used for reference. Some performance learning materials are arranged in the appendix, which can be read and learned as required.

Day 7 Summary

Through the above "several days" efforts, we can build a small and beautiful front-end performance monitoring system. But this is just the beginning, front-end data has a lot of mining value. Performance optimization is also a course to learn. In order to create a smooth use experience and make users more satisfied, build your own front-end data platform!

This article is written after w3ctech entered the Fex special session of famous enterprise Baidu front end, where PPT and video are shared.

Gorgeous undivided line~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Benefits - front end performance learning materials sorting

Performance criteria

★★★★★

Analysis tools

★★★

Introduction

Advanced

Browser and HTML standard ★★

★★

Introduction

Advanced

Development practice

★★★★

currency

Animation and rendering

Mobile development

Performance monitoring

★★★★

Related meetings

★★★

Recommendation blog`