IMCAFS

Home

some key points you "have to" understand about code review

Posted by deaguero at 2020-02-24
all

As we all know, code review in a team can improve code quality, share project knowledge, clarify responsibilities, and ultimately build better software and better team. If you spend a few seconds searching for information about code reviews, you'll see a lot of articles about the value of code reviews.

There are also many ways to do code review: mention pull request in GitHub, or use tools like JetBrains' upsource. But even with a clear process and the right tools, there's a big problem that needs to be solved - what problems we need to find.

There may not be a clear article on "what we need to look for", because there are many different points to consider. As with any other requirement, teams have different priorities for each aspect.

The goal of this article is to list some key points that reviewers can look for, and the priority of each aspect varies from team to team.

Before we move on, let's consider some of the issues you'll discuss in code review. Problems with the format, style, and naming of code, as well as the lack of testing, are common. If you want to have sustainable, maintainable code, these are useful checkpoints. However, it's a waste of time to discuss this in code reviews, because many of these checks can (and should) be automated.

What are the key points that can only be reviewed manually instead of relying on tools?

The answer is that there are an amazing number of points that can only be reviewed manually. In the rest of this article, we'll cover a wide range of features and delve into two specific areas: performance and security.

Design

How to keep the new code consistent with the global architecture?

Does the code follow solid principles and design specifications used by the team, such as domain driven development?

What design pattern does the new code use? Is this suitable?

Is there a combination of standards or design styles in the basic code, and does the new code follow the current specification? Does the code migrate correctly or refer to the old code that is out of specification?

Is the code in the correct location? For example, is the new code related to the order located in the order service?

Does the new code reuse the existing code? Can new code be reused by existing code? Is there a duplicate code for the new code? If so, should a more reusable pattern be restructured, or is it currently acceptable?

Is the new code over designed? Do you want to introduce reuse design that is not needed yet? How does the team balance the perspectives of reusability and YAGNI (you ain't gonna need it)?

Readability and maintainability

Whether the names of fields, variables, parameters, methods and classes really reflect what they represent.

Can I read the code to understand what it does?

Do I understand what the test case measures?

Does the test cover the use case well? Do they cover normal and abnormal use cases? Is there any neglect?

Is the error message understandable?

Is unclear code covered by documentation, comments, or easy to understand test cases? You can decide which method to use according to your own preferences.

function

Does the code really achieve the expected goal? If there are automated tests to ensure the correctness of the code, can the tested code really verify that the code meets the agreed requirements?

Does the code seem to contain obscure bugs, such as checking with wrong variables, or mistakenly writing and as or?

Have you considered?

Do you need to meet the regulatory needs?

Do authors need to create public documents or modify existing help documents?

Is user oriented information checked for correctness?

Is there an obvious error that will cause the application to stop running in the production environment? Does the code incorrectly point to the test database, and is there hard coded stub code that should be removed from the real service?

What are your performance requirements, and have you considered security? These are important areas that need to be covered and also crucial topics.

Let's take a closer look at performance, an area that really benefits from code review.

The non functional requirements of the system for performance should be placed in the same important position as all fields of architecture and design. Whether you are developing a low latency trading system that only allows nanosecond delay, or a mobile application that manages "to-do items," you should understand what users think is "too slow.".

Before considering whether we need to conduct code review on code performance, we should ask ourselves a few questions about what the specific requirements are. Although it is true that some applications do not need to consider where every millisecond is spent, and for most applications, the value of a little CPU degradation obtained by optimizing after a few hours of tossing is limited, some places can be checked by reviewers to ensure that code generation will not have some common and avoidable performance defects.

Is there a hard performance requirement for this code?

Does the code being reviewed involve a previously released service level agreement (SLA)? Or does the requirement itself have special performance requirements?

If the code causes "login page loading is too slow", then the original developer needs to find out how long the appropriate loading time is, otherwise how can the reviewer or the author ensure that the improved speed is fast enough?

If there is a hard requirement, is there a test to prove that it is met? Any performance focused system should provide automated testing of performance, which ensures that the SLA issued is as expected (e.g. all order requests are processed within 10 milliseconds). Without these, you can only rely on your users to tell you that you have not reached the corresponding SLA. It's not only a bad user experience, but it also brings avoidable fines and expenses.

Does this fix or new feature have a negative impact on any existing performance test results

If you run performance tests on a regular basis or if you have test suites that can run them on demand, you need to check that the new code has degraded system performance in performance critical areas. This can be an automated process, but since unit tests are run more often in a continuous integration environment than performance tests, it's worth noting that this can be checked in code reviews.

It is expensive to call external services or applications

Any way to use an external system over the network usually has worse performance than one that is not well optimized. Consider the following:

Call database: at worst, the problem is hidden in system abstraction, such as relational object mapping (ORM). But in code review, you should be able to find common problems that cause performance problems, such as calling databases one by one in a cycle. One situation is to load the ID list, and then query each data corresponding to the id one by one in the database.  

Unnecessary network call: like database, remote service is sometimes overused. In the past, only one remote call can be implemented, or batch or cache can be used to prevent expensive network calls, but multiple remote calls are used to implement. Again, like databases, sometimes abstract classes hide methods that call remote APIs.  

Mobile or wearable applications call back-end programs too frequently: This is basically the same as "unnecessary network call", but there will be other problems on mobile devices, which will not only cause unnecessary call back-end performance deterioration, but also faster power consumption and even lead to user's money expenditure.

Use resources effectively and efficiently

Does the code use locks to control access to shared resources? Does this cause performance degradation or deadlock?

Lock is a large performance overhead, and it is difficult to clarify in multi-threaded environment. Consider using the following pattern: single thread write or modify values, other threads read-only, or use an unlocked algorithm.

Is there a memory leak? Some common reasons in Java are: variable static fields, use of ThreadLocal variables, and use of class loaders.  

Is there infinite memory growth? This is different from memory leak, which means that useless objects cannot be recycled by the garbage collector. But for any language, even if there is no garbage collection language, it can create infinite data structure. As a reviewer, if you see new variables being added to the list or map, you need to ask when the list or map will fail or clear the useless data.

Does the code close the connection or data flow? It's easy to forget to close the connection or file or network data flow. When you review other people's code, if you use a connection to a file, network, or database, make sure they are closed properly.

Is the resource pool configured correctly? The best configuration for an environment depends on many factors, so as a reviewer, it's hard to know immediately whether the database connection pool size is correct or not. But there are some things you can see at a glance, such as whether the resource pool is too small (for example, the size is set to 1) or too large (for example, millions of threads). If you are not sure, start with the default value. Those who do not use default values need to provide some tests or calculations to justify such a configuration.

Warning signs that reviewers can easily find

Some code can see potential performance problems at a glance. This depends on the language and class library used.

Reflection: Java's reflection is slower than normal calls. If you're reviewing code that contains reflections, ask if you have to use it.

Timeout: when you review the code, you may not know the appropriate timeout for an operation, but you need to think about "what will happen to the rest of the system if you timeout?". As a reviewer you should consider the worst case scenario: when a 5 minute delay occurs, will the application block?

What's the worst case scenario if the timeout is set to one second? If the code author can't determine the timeout length, and you as a reviewer don't know the quality of a selected time, then it's time to find someone who understands the impact of this to participate in the code review.

Parallel: does the code use multithreading to run a simple operation? Does this take more time and complexity without improving performance? If modern Java is used, the potential problems are less easy to find than those in the display creation thread: does the code use Java 8's new parallel flow computing but not benefit from it? For example, using parallel flow computation on a small number of elements, or just running very simple operations, may be slower than on sequential flows.

Correctness

These do not necessarily affect the performance of the system, but they are closely related to the operation of multi-threaded environment, so they are related to this topic:

Whether the code uses the correct data structure suitable for multithreading.

Does the code have race conditions? In multithreading environment, code is very easy to cause non obvious race condition. As a reviewer, you can view get and set that are not atomic operations.

Is the code using the lock correctly? Related to race conditions, as a reviewer, you should check whether the audited code allows multiple threads to modify variables that cause the program to crash. Code may need to synchronize, lock, and atomic variables to control code blocks.

Is performance testing of code valuable? It's easy to write small performance test code poorly, or use test data that doesn't represent production environment data, and only get the wrong results.

Caching: Although caching is a way to prevent excessive consumption of requests, there are challenges in itself. If the reviewed code uses caching, you should pay attention to some common problems, such as incorrect cache invalidation.

Code level optimization

For most organizations that do not want to build low latency applications, code level optimization is often premature, so first of all, we need to know whether code level optimization is necessary.

Does the code use synchronization or lock operations where they are not needed? Locks are often unnecessary if the code is always running in a single thread.

Can code use atomic variables instead of locks or synchronization?

Does the code use unnecessarily thread safe data structures? For example, can you use ArrayList instead of vector?

Does the code use low-performance data structures in common operations? For example, use a linked list where you often need to find a specific element.

Can code use lazy loading and get performance improvement from it?

Can conditional statements or other logic short-circuit other statements by putting the most efficient evaluation statements first?

Is there a lot of string formatting in the code? Is there a way to make it more efficient?

Does the log statement use string formatting? Do you use conditional statements to verify the log level, or delay evaluation?

Simple code is efficient code

There are some simple things in Java code for reviewers to look for, which will enable the JVM to optimize your code for you:

Short methods and classes.

Simple logic, that is, eliminating nested conditions or loop statements.

security

The effort you spend on building a secure, secure system, as you spend on other features, depends on the project itself, where it runs, the users it uses, the data it needs to access, and so on. Let's focus now on some of the things you might be looking at when reviewing code.

Use automation whenever possible

A staggering number of security checks can be automated without human intervention. Security testing does not need to start all systems for complete penetration testing. Some problems can be found at the code level.

Common problems such as SQL injection or cross site scripting can be detected in the continuous integration environment through the corresponding tools. You can also check the known vulnerabilities in your dependency automatically through OWASP dependency detection tool.

Sometimes it needs to be "case by case"

You can simply answer "yes" or "no" to some verifications. Sometimes you need a tool to point out potential problems, and then it's up to the human to decide whether or not to solve them. This is the real flash point of upsource. Upsource displays the results of code checks that reviewers can use to determine whether the code needs to be changed or to accept the current situation.

Understand the dependencies you use

The third-party class library is a way to erode system security and cause loopholes. When reviewing code, at least you need to check whether new dependencies (such as third-party libraries) have been introduced. If you don't have an automated vulnerability check, you should check for known problems in the newly introduced class library.

You should also try to minimize the version of each class library. Of course, if other dependencies have an additional indirect dependency, this may not be achieved. But the simplest way to minimize the exposure of one's own code to other's code (through class libraries or services) is to:

Use source code as much as possible and understand its credibility.

Use the highest quality library you can get.  

Track where you use what, and when new vulnerabilities arise, you can see how affected you are.

Check if new paths and services require authentication

Whether you develop a web application, provide a web service, or some other API that requires authentication, when you add a new URI or service, you should ensure that it cannot be accessed without authentication (assuming authentication is a requirement of your system). You just need to simply check that the developer of the code has written the appropriate test case to show that it has been certified.

You should not only consider authentication for human users who use usernames and passwords. Authentication is also required for other systems or automated processes to access your application or service. This may affect the definition of "user" in your system.

Whether data needs to be encrypted

When you save some data to disk or cable, you need to know whether the data should be encrypted. Obviously passwords should never be simple text, but there are many other situations where data needs to be encrypted. If the code being reviewed is wired to transmit data or stored somewhere or otherwise left your system, and you don't know if it should be encrypted, try asking someone in your organization who can answer that question.

Is the password well controlled?

The password here includes password (such as user password, database password or password of other systems), secret key, token, etc. These should never be stored in code or configuration files that will be submitted to the source control system. There are other ways to manage these passwords, such as through the secret server. When reviewing code, make sure that the passwords don't sneak into your version control system.

Should code run be logged or monitored? Is it used correctly?

Logging and monitoring requirements vary from project to project, some need to be compliant, some have stricter behavior and event logging specifications than others. If you have rules about what to log, when and how to log, then as a code reviewer you should check whether the submitted code meets the requirements. If you don't have set rules, consider:

Does the code change the data (e.g. add, delete, modify)? Should records be kept of who changed what?

Does the code cover critical performance parts? Should start and end times be recorded in the performance monitoring system?

Is the log level of each log appropriate? A good rule of thumb is that "error" triggers a prompt to be sent somewhere, and if you don't want these messages to wake up at 3 a.m., then downgrade them to "info" or "debug". When multiple outputs may be generated in a loop or one piece of data, they generally do not need to be recorded in the production log file, and they should be placed at the "debug" level.

Remember to call an expert

Security is a big topic, big enough for your company to hire technical security experts. If we have security experts, we can get their help, such as inviting them to participate in code review, or inviting them to join us in code review.

If this cannot be achieved, we can fully learn our system environment to understand what kind of security requirements we have (there are different standards for internal enterprise applications and customer-oriented Web Applications), so we can better understand what we should see in code review.

summary

Code review is a good way to not only ensure code quality and consistency, but also share project knowledge within or among teams. Even if you have automated basic validation, there are many different aspects of code and design to consider.

Code review tools, such as upsource, can help you locate potential problems by highlighting suspicious code in each code submission check and analyzing which problems have been fixed and which problems have been introduced. The tool can also simplify the process because it provides a platform to discuss design and code implementation, or invite reviewers, authors, and other stakeholders to discuss until a consensus is reached.

Finally, teams need to take the time to decide which factors of code quality are important to them, and experts need to manually decide which rules should be applied to each code review. Everyone involved in the review should have and use interpersonal skills, such as positive feedback, negotiation and compromise to reach the final consensus, that is, how the code should be "good enough" to communicate Over censorship.

This translation has been authorized. Link to the original:

https://www.infoq.com/articles/effective-code-reviews

Translator: Zhou Yuanhao

Today's recommendation number: talk about the structure

It mainly focuses on the system architecture of traditional enterprises and Internet enterprises, and studies and discusses the difficulties and pain points in the architecture practice. Organize online group sharing activities regularly.

Today's recommendation

Click the picture below to read

How to build a sustainable development model of programmer capability growth?

Like our will point like, love our will share!