talk about container network

Posted by santillano at 2020-03-17

I believe that a real container work or operation and maintenance of a container environment, and a real topic that everyone will encounter when making production on the container is container network, so I'll share this topic with you today, and welcome you to share different opinions.

Since the birth of the container, the two topics of storage and network have been talked about by everyone. We talk about the network problem in today's environment, because the container's demand for the network is very different from the traditional physical and virtual environment's demand for the network environment. Virtual has done a lot of work, such as virtual router virtual switch, now it is found that the container environment is not so good to adapt. Important reasons: In the virtual age, there may be many hundreds of virtual machines, but in the container age, we talk about micro services, an application of a huge ecosystem, so many services, and each service has so many container strengths. After they form so many ecosystems together, their strength is much smaller than the traditional virtual machines, and their dispersion is also much greater than the traditional virtual machines Obviously, it is much higher, so it will have higher requirements for the network.

Container technology development to today, container network has been lagging behind, I wonder if you have such a feeling? Volume plugin has been published for storage docker for a long time. However, there has not been a standard at the network level until recently, a standard called CNM has emerged. So its network has been lagging behind. This situation provides a space for a large number of startups or open source organizations. They have developed a variety of network implementations, and there is not a single state, but a lot of complex solutions, making customers have no choice.

Some customers feel that SDN is very good. We introduced the concept of SDN into the container field, and as a result, the problem became more complex. Therefore, we saw that a large number of customers are very scratching their heads on the selection of container network. Not only customers, but also solution providers are thinking about what kind of container network solutions they can provide to meet customers' needs every day. So, today, we will focus on the network and sort it out with you.

The development of container network has gone through three stages. The first stage is the "Stone Age". The earliest container network model is the network inside the host. If you want to expose the service, you need to do port mapping, which is very primitive and ancient. For example, there are many Apache containers in a host. Each Apache needs to throw 80 ports out. What should I do? I need to map the first container to port 80 of the host, the second to port 81 of the host, and so on. At last, I find it very confusing and can't be managed. The network model of the eastern and Western stone age can hardly be adopted by enterprises.

Later, it evolved to the next stage. We call it the solution of heroes in large numbers. For example, the network implementation of rancher based on IPSec, the network implementation of flannel based on three-layer routing, including some open-source projects in our country, will be discussed in detail later.

Up to now, the container network has appeared the pattern of androgyny. One is a CNM architecture led and developed by docker, the other is a CNI architecture led and developed by Google, kubernetes and coreos. Each of the two families stands on the top of the mountain. The rest of the heroes choose their own. I will talk about these two plans in detail later.

Before I start on the following topics, I will explain several technical terms:

IPAM: IP address management. This IP address management is not unique to containers. Traditional networks, such as DHCP, are also IPAM. In the era of containers, we talk about IPAM. There are two main methods: IP address segment allocation based on CIDR or accurate IP allocation for each container. But in a word, once a container host cluster is formed, the containers above must assign a globally unique IP address to it, which involves the topic of IPAM.

Overlay: build an independent network on the existing two-tier or three-tier network, which usually has its own independent IP address space, exchange or routing implementation.

IPSec: a point-to-point encrypted communication protocol, which is generally used in the data channel of ovrelay network.

Vxlan: a solution jointly proposed by VMware, Cisco, RedHat, etc. this solution is mainly to solve the problem that VLAN supports too few virtual networks (4096). Because every tenant on the public cloud has a different VPC, 4096 is obviously not enough. With vxlan, it can support 16 million virtual networks. Basically, the public cloud is enough.

Bridge: a network device connecting two peer-to-peer networks. In today's context, it refers to the Linux bridge, which is the famous docker0 bridge.

BGP: routing protocol of backbone autonomous network. Today, there is the Internet. The Internet is composed of many small autonomous networks. The three-layer routing between autonomous networks is implemented by BGP.

SDN, openflow: software defines a term in the network, such as the flow table, control plane, or forwarding plane that we often hear.

This is a stone age network model. Let me just say it. It is a container network before docker 1.9. The implementation method is to manage IPAM only for a single host. The containers on all hosts will be connected to a Linux bridge inside the host, called docker0. By default, the IP of the host will be assigned to an IP in the 172.17 network segment. Because there is docker0, the containers on one host can realize interconnection. But because the range of IP allocation is based on a single host, you will find that the same IP address will appear on other hosts. Obviously, there must be no direct communication between the two addresses. In order to solve this problem, we will use port mapping in the stone age, which is actually the NAT method. For example, I have an application, which has web and mysql. On different hosts, the web needs to access mysql. We will map the 3306 port of Mysql to the 3306 port of the host, and the service is actually to access the 3306 port of the host IP, which is a practice in the past stone age.

Summarize its typical technical features: IPAM based on single host; the host's content device communication actually passes through a docker0 Linux bridge; if the service wants to be exposed to the outside, it needs to do NAT, which will lead to serious port contention; of course, it has one advantage, less consumption for large network IP. This is the stone age.

Rancher network

Now, in the era of heroes, the first thing I want to talk about is rancher. In the stone age, rancher's network solution was very eye-catching. It needs to solve two major problems: the first is to assign a globally unique IP address, and the second is to realize the container cross host communication. First of all, it has a centralized database. Through database coordination, each container in the resource pool is assigned an independent IP address. The second is how to realize the container cross host communication. An agent container will be placed inside each host, and all containers will be connected to the agent container of this host. This agent is actually a forwarder, which is responsible for encapsulating and routing data packets to other designated hosts. For example, when accesses this, the container will first throw the data package to the agent of the machine. According to the internal metadata, the agent knows that is on other hosts. Then the agent will package the data package as IPSec package and send it to the opposite host through IPSec. When the opposite host receives the IPSec packet, it performs the unpacking operation and sends it to the corresponding container on the local host. The implementation of this method is very clean and simple, but it has a big problem, that is, the communication problem of IPSec, which is very heavy and inefficient. According to rancher, this problem seems not so exaggerated. There is a coprocessor in Intel's CPU that can handle aes-ni instructions. The IPSec implementation of Linux kernel can use aes-ni instructions to speed up IPSec efficiency. Based on this, it is said that IPSec protocol can be compared with vxlan.

Features of rancher network: It is a global IPAM to ensure that the container IP address is globally unique; host communication uses IPSec; host port contention will not be too serious, and application communication will not occupy the host port, but if your service wants to be exposed finally, you still need to map to the host; this is rancher, which is very simple and clean, just like rancher itself.


Another network implementation is called flannel, which is dominated by coreos and used in kuberenates. Flannel also needs to solve two problems: IP address allocation and cross host communication. For the problem of address allocation, it uses CIDR method, which is not very smart in my opinion, that is to assign an address segment to each host, such as an address segment of 24 bit mask, which means that 254 containers can be supported on this host, and each host will be divided into a subnet IP address segment, which is to solve the problem of IP address allocation After it is assigned to docker demon, docker demon can assign IP to the container. The second problem is how to realize cross host packet switching. It adopts three layers of Routing: like the traditional method, all containers will be connected to docker0, and a flannel0 virtual device will be inserted between docker0 and the host network card. This virtual device brings great flexibility to flannel - different packet and tunnel protocols can be implemented, such as vxlan, Encapsulate the data package as a UDP package of vxlan through the flannel0 device. That is to say, flannel0 can be used for protocol adaptation, which is a feature of flannel and one of its advantages.

Let me summarize again. Every host of flannel assigns an address segment, that is, assigns a CIDR. Then there may be multiple packet modes between hosts, which can support UDP, vxlan, host GW, etc. ip between containers can be interconnected. However, if a container wants to expose services, it still needs to map IP to the host side. In addition, flannel's CIDR based design is stupid, which will cause a lot of IP address waste.


The next one is calico, which is a relatively young project, but with great ambition, it can be used in virtual machine, physical machine and container environment. Calico uses a BGP protocol that most people may not have heard of, and it is completely based on three-tier routing, and it has no two-tier concept. So you can see a large number of routing tables constructed by Linux routing in calico. The changes of routing tables are managed by calico's own components. The advantage of this method is that the IP of the container can be accessed directly to the outside, and can be directly allocated to the service IP. If the network device supports BGP, it can be used to realize a large-scale container network. At the same time, this implementation does not use the tunnel and NAT, resulting in no performance loss and good performance. So I think this calico is very outstanding from a technical point of view. But BGP brings its advantages as well as its disadvantages, that is, BGP protocol is rarely accepted in the enterprise, and the enterprise network management is not willing to open BGP protocol on the cross network router - its scale advantage cannot be exerted. This is the calico project.

The fourth one is truth cloud. Its founder is Dr. Mao Wenbo. Dr. Mao and I used to be colleagues of EMC. He mainly focuses on the security field of virtualization. The container network of truth cloud is very advanced in technology field. He thinks that if we want to design a new network now, why not combine SDN with docker network? So you can see the architecture above. The top layer is the control plane of SDN, and the bottom is a bunch of openflow switches. The concept of truth cloud SDN is indeed advanced, and the core problem is that it is difficult for enterprises to accept it. Imagine that SDN has not been popularized in enterprises, and the promotion of SDN container network is more difficult, that is, there are limitations at the landing level. We think this network more represents the future. Maybe it will be like this one day when the container network develops to a more mature stage, but at present, it is still a little sunny.

In summary, we will find that container network technology comes from two technical schools. The first is tunnel technology, such as rancher's container network and flannel's vxlan mode. The feature of this technology is that there is no too high requirement for the underlying network. Generally speaking, the only requirement is three-layer reachability - as long as your host is in a three-layer reachable network, it can build a container network based on the tunnel for you, with low requirement for the network. But what is the problem? Once the overlay network is built, the value of the network monitoring that the enterprise has built and the management function of the enterprise's network department will be reduced a lot, because the traditional network equipment can not see what kind of data you run in the tunnel, so it is impossible to monitor and manage. At the same time, we know that all the implementation points of the basic core of the oevrlay network are in the host, and the network is not the host, they are in charge of the lower network. As a result, they must now manage some of the virtual devices in the host, while the traditional host management should be the system department, so there will be cross management, and the network department and the system department will have no separation of rights and responsibilities, resulting in many customers Users are reluctant to use tunnel technology.

The second technology is routing technology. The advantages of routing technology are clean, no NAT, high efficiency. It can be integrated with the current network, and each container can allocate a service IP like a virtual machine. You can use the container in the most natural and acceptable way, as if you just assigned a new virtual machine. However, there are two problems in the routing network. Once the routing network is used, it has a great impact on the existing network equipment. Now, as a network comrade, we should know that there should be a space limit of the router's routing table - 230000. If tens of thousands of new container IP hit the routing table at once, the underlying physical devices can't bear it; at the same time, each container is assigned a service IP, and your service IP will soon be consumed. In general, there is a principle for IP allocation in large enterprises. There may be thousands or a segment of IP allocated to the container platform project. There is no way for you to allocate an IP in each container. This is the routing and tunneling technology. We don't see a perfect technology. Each has its advantages and disadvantages.

Now let's see what the customer said. An internet bank in South China is very resistant to overlay network. He said that the ability of network technology department is not enough to maintain an overlay network. If there is a problem in the traditional network, we know how to fix it. But if there is a problem in the overlay network, we don't know how to fix it, and it will get out of control. We have a national joint-stock bank in the North District, which is quite disgusted with tunnel technology. At present, they have deployed SDN and do not want to drill holes in SDN, because once the tunnel operation and maintenance department is built, it will become blind, and the things that can be managed before will not be managed now. A financial institution in East China is not willing to accept IPSec based tunneling technology. They say that the performance of ipcec will be weaker, so we can see that most of the current customers prefer to use this traditional routing technology network.

The third stage of the development of container network is the pattern of androgyny. Shuangxiong Association actually refers to docker's CNM and CNI dominated by Google, coreos and kuberenates. First of all, make it clear that CNM and CNI are not network implementations. They are network specifications and network systems. From the perspective of R & D, they are just a bunch of interfaces. Your bottom layer is to use flannel or calico. They don't care. CNM and CNI care about network management.

CNM is a network model that comes with docker and can be directly managed by docker command. CNI is not docker native. It is a general network interface designed for container technology. There is no problem for CNI interface to be called from top to bottom, but it is unlikely to be supported from bottom to top, or the implementation will be very tricky, so it is difficult for CNI to be actively activated at the docker level. These two models are all plug-in, you can plug in the form of plug-in to specific network implementation. These two plug-ins, CNM to be more paternalistic, flexibility is not so high. Because CNI is universal, its flexibility is relatively high. These are the basic characteristics of these two norms.

After they have established these two standards, they all have to face a problem: which standard should I support? At present, the situation supported by various companies is as follows: docker swarm is first on the CNM side, which belongs to docker company. At present, CNM is supported by content cloud. This choice is not a technical reason, but because our current platform only supports docker. We do not support rocket yet. If we support container technology other than docker one day, we may also support CNI at the same time. Kubernetes, of course, supports CNI. Other cases such as calico, weave, and mesos are supported by both sides.

I've talked about specific technologies, and I'll take you a few minutes to talk about the support of cloud for the network. Because yourong cloud was born in rancher, we inherited all the advantages of rancher. We also have rancher's IPSec based network. But from our point of view, the network we support needs to adapt to the needs of our customers to the greatest extent. If the customer said to use a simple IPSec overlay network, some of our customers want to use vxlan network, which also supports libnetwork overlay, in fact, the essence is vxlan. At the same time, if the client wants to assign the service IP to the container like the virtual machine, we also have a network implementation based on MAC VLAN.

Briefly introduce macvlan. The core technology of MAC VLAN is to virtualize physical network cards into multiple virtual network cards. Each virtual network card has an independent MAC address. From the outside, it is like dividing the network cable into two strands and connecting them to different hosts. Based on MAC VLAN, I can assign this MAC address to each container. That is to say, in the future, each container will have an independent MAC address and business network IP, which can work like an independent virtual machine.

Another one worth introducing is ipvlan L2 mode, which is basically consistent with the behavior of macvlan in terms of phenomenon, except that each container has no independent MAC address. The MAC addresses of all containers on a single host are the same, which is the only difference between MAC VLAN and MAC VLAN, and we think this way is also promising. In the end, these two methods can achieve an effect, that is, the container can be assigned a service network IP like a virtual machine, and can directly access the IP of the service network from the outside, and directly access the container.

This article is changed from official account: inclusive cloud.

[cnutcon global container technology conference] there are 12 topics on microservice, continuous integration, container cloud, big data, e-commerce, traditional industries, start-up companies, etc. the core technology directors of docker, kubernetes, Netflix, mesos, coreos, Alibaba, JD and other companies disclose the secrets of containerization and microservice on site. For details, please click the link to read the original text.