In recent years, the Internet industry has been on a fast track of rapid development, with one hot spot after another constantly pouring into surrounding industries. The emergence of shared bicycles has solved the problem of people's last-mile travel; the introduction of the new retail concept and the emergence of unmanned cargo containers have connected online and offline data and optimized people's shopping experience; short video applications, led by Douyin (Chinese version of TikTok), have become popular overnight and have become an important carrier of social interaction and publicity.
As the infrastructure supporting these businesses, the scale of data centers has gradually expanded with the surge in business, and it is no longer uncommon for a single cluster to have more than tens of thousands of servers. Faced with such a large number of servers, the initial deployment of switch networks at the beginning of network construction has become a nightmare for network engineers.
1
Focusing on the data center scenario, let's first review the traditional switch network deployment process.
▲Figure 1 Deployment process of traditional network devices
Step 1: Prepare the switch version file in advance;
Step 2: Select the corresponding configuration template according to the deployment architecture scenario;
Step 3: Start "moving bricks", arrive at the site, unpack the equipment, stack them together, and power on;
Step 4: Use a USB flash drive to plug into the switch, connect the serial port cable to the device, and upgrade the version one by one (if necessary);
Step 5: Revise the VLAN, IP, routing, and other basic information according to the configuration template and the actual device situation, and then start to refresh the configuration (in addition, according to the requirements of each company, we will also assist in collecting the SN of the device for asset management);
Step 6: Coordinate with weak current students to help move it to the computer room and put it on the shelf;
Step 7: Complete the connectivity test based on the Checklist.
10 minutes to complete the upgrade and configuration of a device without stacking. In the case of stacking, it takes 30 minutes to complete because the device needs to be restarted.
In summary, for data center network scenarios that support tens of thousands of server clusters, if deployed according to this traditional deployment mode, there will be the following shortcomings:
Inefficiency: Manually upgrading device versions and updating scripts is very inefficient.
High error rate: Repetitive work will cause network workers to be tired of thinking, and mistakes may occur if they are not careful. Extra time is needed to troubleshoot and correct the errors, resulting in a certain amount of time loss.
In this regard, various manufacturers have done a lot of exploration, such as zero-configuration online technology.
As the scale of data centers continues to grow, standardized construction methods have become mainstream. The scale of servers and network devices in each cluster is fixed, and the configuration strategies corresponding to the network devices are also consistent (of course, basic information such as IP will be slightly different). Therefore, in such a standardized and normalized scenario, we can compile these repetitive and mechanical tasks into processing logic and let the network devices complete the online work themselves, freeing up manpower and reducing the risk of people making mistakes.
Manufacturers develop automated onboarding technologies by using programming languages (such as Python and Go ) to perform logic processing, allowing network devices to load and go online by themselves—for example, ZAM ( Zero-configure Automatic Manage, Zero configuration automatic management) technology.
The idea of automatic switch launch: After a new or empty-configuration device is powered on, it automatically obtains and loads files (including version files, configuration files, patch files, etc.) from the file server, realizing on-site configuration and deployment of the device, thereby avoiding the operation and maintenance personnel from repeatedly performing mechanical operations such as flashing scripts and upgrading, and improving the start-up efficiency.
Taking ZAM technology as an example, first, a unified DHCP server and TFTP server (can be the same server) must be built on the operation and maintenance side, and a complete configuration file and version file for each device must be generated in advance based on the standardized architecture design and project information. At the same time, the file names of these files need to be associated with the device's SN to ensure that each configuration file can correspond to a unique specific physical device.
▲ Figure 2 Zero configuration online solution architecture
After the switch is powered on, it will actively initiate a DHCP request and obtain the address and script name of the TFTP server through Option. After downloading the script file through TFTP, it will be parsed and executed. Then the version will be automatically upgraded and the configuration file will be loaded. After restarting, the device will be started. The main process is as follows:
▲ Figure 3 Zero configuration online process
The details can be broken down into several stages to understand:
DHCP&TFTP acquisition phase
▲ Figure 4 Typical DHCP configuration
● The switch device (which needs to support the Python runtime environment) downloads the Python script.
Automatic execution phase
● Initialization:
First, you need to initialize and clear the local files that have been run before. This is to avoid confusion when the files are reached through the wrong logic module. Therefore, you need to perform the initialization operation.
● Upload SN:
After initialization is completed, the devices own SN needs to be uploaded to the specified directory to prompt the TFTP Server that the device has started. That is, upload [sn]. ZAM to /ZAM_STARTUP/. If the upload fails, write "0" to /etc./zam.txt (notifying that zam execution failed) and exit;
● Configuration parameter file download:
Uploading SN, the corresponding configuration parameter file (.PARAM format) will be downloaded based on SN. The download path has been written in the code. If the download fails, it will jump to error handling;
● Configuration parameter file parsing:
After the download is successful, the file is parsed to determine whether the version needs to be upgraded. If the parsing fails, it jumps to error handling;
● Configuration file download:
Normally, the script will have a default download path for the configuration file. The device will download the configuration file [SN].CFG from this path is based on SN. If the download fails, it will jump to error handling.
● Version file download:
If there is a bin file identifier in the configuration parameter file, go to the specified path to download the bin file. If the download fails, jump to error handling;
● Version file upgrade:
If the bin file exists, upgrade the bin file. If the upgrade fails, jump to error handling.
● Rename the configuration file:
Execute the code to rename the configuration file, change it to the format of config.txt, and then restart (configuration files of different manufacturers have different names, Ruijie is used as an example here);
● The device restarts and loads:
After the device is restarted, the configuration file is checked again. If a configuration file exists, the device does not enter the ZAM mode, but loads the configuration file and starts normally.
Information synchronization verification phase
● Successful processing:
After the device is successfully launched
A. Upload [SN].log to /ZAM_LOG/
B. Upload [SN].ok to /ZAM_STATUS/
C. Write "1" to /etc/zam.txt(Notifying that Zam was successfully executed)
D. Clear the local files related to this execution([sn].CFG, [sn].ZAM, [sn].params, [sn].ok, [sn].error).
▲ Figure 5 ZAM folder directory
▲ Figure 6 Successful execution of ZAM process
● Failure handling:
When any node in the main logic fails and jumps to fault handling, the following actions are performed
A. Upload [sn].log to /ZAM_LOG/
B. Upload [sn].error to /ZAM_STATUS/
C. Write "0" to /etc/zam.txt (notify zam execution failure)
D. Clear the local files related to this execution ([sn]. CFG, [sn]. ZAM, [sn].params, [sn].ok, [sn].error).
As network equipment becomes more mature in supporting high-level programming language environments like Python, zero-configuration onboarding technology has been widely used in IDC scenarios. On the one hand, it speeds up the start-up of equipment, reduces the risk of human operation errors, and saves manpower investment; on the other hand, for the later equipment operation and maintenance work (such as capacity expansion, fault replacement, etc.), only reporting or replacing the SN can automatically complete the loading, and no human intervention is required.
So, in addition to IDC scenarios, are there similar operation and maintenance pains in other scenarios?
Adhering to the ideas of innovation, exploration, and practice, Ruijie Networks has also conducted in-depth market research and analysis in new retail store scenarios.
The new retail stores that have become popular recently are characterized by their large number. The surge in business has led to a particularly fast opening of stores. However, there are no professional operation and maintenance personnel on site, so network engineers need to travel around when the network is deployed; if faults are encountered daily and cannot be resolved remotely, network engineers also need to go to the site to handle them, which requires a huge investment in manpower.
In this regard, focusing on the new retail store scenarios, we innovate based on the idea of zero-configuration startup of the data center and are committed to creating a new automated store network solution to improve the efficiency of the new retail store network startup and operation and maintenance and reduce human service costs.
▲ Figure 7 networking design of automated store network solutions
We are currently actively exploring and practicing with new retail customers, so stay tuned. If you have any suggestions or ideas, please leave a message to interact and discuss together.
Summary (intro for LinkedIn along with figure 7)
This article explores the use of zero-configuration technology in data center automated operation and maintenance. The traditional process of deploying switch networks in data centers is time-consuming and prone to errors. To address these challenges, manufacturers have developed automated onboarding technologies, such as ZAM (Zero-configure Automatic Manage), which allow network devices to load and go online by themselves. This technology simplifies the deployment process, reduces human error, and improves efficiency. This article mentions the potential application of zero-configuration technology in new retail store scenarios to optimize network startup and reduce operational costs.
Ruijie Networks websites use cookies to deliver and improve the website experience.
See our cookie policy for further details on how we use cookies and how to change your cookie settings.
Контакты
How can we help you?