Data Center Operations and Maintenance: Trends and Evolution
Authors: James Liu , Li Jingjing
Published: August 30, 2018
Preface
Data center operations and maintenance (O&M) is critical, tedious, and highly repetitive. In the face of rapid data center development, traditional O&M is struggling with high labor costs, outdated management tools, and low efficiency. As Internet of Things (IoT) and digital intelligence technologies mature, they are profoundly reshaping existing O&M methods and workflows, fully unlocking the potential of the data center.
I. Analysis of Global Trends in Data Center O&M
The focus of global data center development is shifting from scale-based construction to refined operations.
With the entry of giants such as Amazon, Microsoft, Alibaba, Tencent, Huawei, China Mobile, and China Telecom, competition in the domestic cloud service market has intensified. Cloud service providers are striving to reduce operating costs while expanding their market presence.
Ma Li, President of Huawei Cloud & Big Data Product Line, noted in his article "Intelligent O&M: The Future Path for Cloud Data Centers" that future O&M must:
Improve Efficiency: Virtualization and open-source technologies have made O&M increasingly complex. Traditional manual modes are slow and error-prone. While a single person traditionally maintained 50–100 devices, massive cloud environments would require an unsustainable amount of manpower under that model.
Maintain Low Costs: Traditional IT resource utilization is often below 20%. While cloudification improves this, personalized and on-demand elastic requirements can lead to resource fragmentation and unbalanced loads. This often results in overall utilization failing to meet targets, keeping O&M costs high.
Plug-and-Play Hardware & Scheduled Retirement: As scale grows, manual hardware identification and installation cannot support rapid scaling. With plug-and-play technology, low-skilled personnel simply rack, network, and power the device; the O&M system then automates the end-to-end deployment. Furthermore, through cloud-based isolation, faulty hardware no longer requires immediate repair—it can simply be replaced by low-skilled workers on a scheduled basis.
Ultimately, whether driven by business requirements or competitive necessity, higher efficiency and lower costs are the primary drivers of O&M technology.
II. History and Current Status of O&M Technology
1. Historical Obstacles to Development
Despite its importance, O&M technology has long lagged behind, with a "heavy on construction, light on maintenance" mentality prevailing in the industry. To many, O&M is still mistakenly equated with "cool UI interfaces and giant screen displays."
The primary reasons for this lack of modernization include:
Outdated Technology: Early infrastructure software mostly came from UPS or AC manufacturers focusing on "Dynamic Environment" (power and room environment) management. These vendors often lacked IT expertise, and their technology lagged behind the rapid pace of IT development.
Obsolete Mindsets: Early IDC management evolved from telecom "equipment room maintenance," focusing on physical safety but lacking an understanding of Return on Investment (ROI) and refined management.
Low Demand: In the past, facilities were smaller, labor was cheap, and business logic was simple. The incentive to automate was low because the cost of failure was relatively small.
Simple Architectures: Traditional "Silo" architectures meant hardware and business tasks were deeply coupled. When a fault occurred, the 1:1 physical-to-logical relationship made it easy to locate and fix.
2. Current Status: The Consensus on Digital and Automated Operations
As hyperscale data centers proliferate, several factors have pushed O&M to the forefront:
Maturity of IoT: Success in logistics and smart factories has proven the value of IoT. Meanwhile, many data centers have remained in a "dark" state—using manual spreadsheets to manage "dumb" devices. IoT is now being used to unlock the latent value of these assets.
Convergence of CT and IoT: Influenced by the Internet, "Software-Defined Data Centers" have become mainstream. Traditional maintenance concepts are shifting toward IoT-driven operations, utilizing O&M robots and U-level asset management.
Market Polarization: The market is splitting between Hyperscale (requiring automation to manage hundreds of thousands of servers) and Micro/Edge centers (requiring unmanned, remote management due to a lack of on-site professional staff).
Cloud Architecture Impact: Cloud computing has turned hardware into a "resource pool." This means hardware importance (individually) has decreased, but the frequency of replacements and upgrades has increased. Managing these assets manually is no longer feasible.
Rising Labor Costs: With an aging population and rising wages, finding and retaining skilled O&M personnel in both major and secondary cities has become a significant headache for managers.
III. From "Maintenance" to "Operations": Saving Money is the Bottom Line
The industry is shifting from Maintenance (Focus: Reliability and avoiding errors) to Operations (Focus: Reliability plus cost-efficiency). As data centers transform from "Cost Centers" to "Production Centers," O&M personnel must change their mindset.
A truly "good" data center must address the following operational pain points:
Asset Inventory: Manual tracking of diverse equipment is a massive waste of high-skilled labor time.
Capacity Management: Coarse management makes it difficult to quickly identify the best location for new equipment (based on power/cooling), delaying service launches and hurting ROI.
Fault Localization: In massive facilities, 80% of O&M time is often spent just locating a fault. Reducing recovery time is vital for minimizing business losses.
Asset Security: Most facilities have room-level security (access control), but few have U-level security. Real-time alerts are needed to track exactly when and where a specific piece of hardware is being accessed or moved within a rack.

Image Source: Digital-People U-Level Asset IoT Solution
In summary, we can identify several core demands from data center users at the operational level:
Cabinet Capacity Management: Real-time monitoring of cabinet space information to improve utilization (thereby reducing cabinet rental costs).
Automated Equipment Changes: Automatic reporting of asset movement and changes (reducing labor expenditures).
Data Accuracy: Achieving 100% accuracy in asset data (eliminating the cost of manual verification).
Automated Asset Inventory: Rapid, automated inventory of large-scale assets without human intervention (saving O&M manpower).
Rapid Localization: Quickly and accurately locating faulty equipment (minimizing troubleshooting time).
Precise Asset Protection: Enhancing physical security at the U-level; automatic alerts for unauthorized events (ensuring asset and data security).
IV. Applications of IoT Technology in Data Centers
The authors believe that compared to AI and Big Data, Internet of Things (IoT) technology is likely to be the first to achieve large-scale application in data center operations. For example, while it is well known that Google utilizes AI techniques like neural networks for energy management, there is little public documentation on the specific implementation, actual efficacy, or scalability for other users. High-tech and internet giants possess the talent and resources to innovate with AI, but for the majority of data center users, IoT technology is more mature and ready for practical implementation.
Currently, several data centers have already integrated IoT into their next-generation planning. Many IoT-based data center technologies have entered the stage of large-scale application, primarily in the following areas:
IoT Applications in Asset Security:
Infrastructure Monitoring: Utilizing sensors to monitor critical facility components such as batteries, UPS systems, and air conditioning units.
Life-Cycle Prediction: Predicting the remaining lifespan of equipment through status monitoring, enabling early warnings before a failure occurs.
IT Asset Management via Electronic Tags: Managing IT equipment through electronic asset tags to precisely locate every device down to its specific U-position. If an unauthorized device is removed or a tag is tampered with, the system triggers immediate alarms both on-site and at the management console.

Image Source: Tencent T-block U-Level Intelligent Management
IoT Applications in Capacity and Energy Consumption:
U-Level Space Management: Using sensors to monitor the physical space utilization of cabinet U-slots in real-time. By automatically tracking the mounting, unmounting, and migration of IT equipment, the system helps users maximize cabinet space efficiency.
Energy and Environmental Monitoring: Real-time monitoring of data center energy consumption, temperature, and cooling via sensors. This data assists users in planning equipment deployment more rationally to reduce overall energy footprints.
Micro-Environment Control: Through precise U-level management, it is even possible to monitor the "micro-environment" within an individual cabinet, ensuring optimal conditions for high-density hardware.

Image Source: Resource Visualization Management View of a Data Center
IoT Applications in Automated O&M:
Automated Data Entry: Manually entering installation data for a large number of devices consumes significant manpower. By utilizing a U-level asset IoT system, equipment information is automatically recorded and uploaded to the backend. This replaces repetitive manual entry tasks and frees up O&M personnel for higher-value work.
Intelligent Fault Localization: When a device fails, it often triggers a chain reaction, resulting in a flood of alarms that make it difficult for the backend to identify the root cause. Using IoT technology, the system can automatically pinpoint the faulty device, allowing maintenance staff to accurately identify its specific area, cabinet, and even its exact U-position.
Conclusion
Beyond the functions mentioned above, IoT technology can enable many other capabilities that are outside the scope of this discussion. It is the authors' hope that more emerging technologies will find practical applications in the data center field, and that more manufacturers and technical professionals will contribute to achieving efficient, digital, automated, and refined data center operations.