Artificial intelligence and high-performance computing (HPC) are reshaping the data center market. The top 15 cloud companies are incorporating AI into every product and service and plan to invest a record $200 billion in capex in the AI market in 2024. As more users adopt AI applications, the more productive they become, the more data they consume and the more traffic flows through AI data centers. Tight time synchronization, or precision timing, can enhance the efficiency of both compute (AI workload) and bandwidth efficiency of the networking infrastructure that feeds the AI servers.
In recent years, data centers have evolved to create back-end networks to support the growing data traffic required for AI training. Growth in the AI market is driving sales for a wide range of back-end network equipment, including interconnects, switches, servers and smart network interface cards. Expanding back-end networks poses significant challenges:
- Power: A standard server consumes 2.5 kW per rack, while an AI server consumes 25 kW per rack.
- Speed: Faster SerDes are required to feed GPUs enough data so they don’t sit idle.
- Cost: The network required to feed a GPU at the rate it consumes data is cost-prohibitive, costing 4× more than the GPU itself.
Precision timing technology plays a crucial role in ensuring the seamless operation, synchronization and reliability of cloud infrastructure and AI data centers. As AI algorithms become more complex and data centers grow in scale, the demand for ultra-accurate and reliable clock signals has never been more critical.
The role of precision timing in AI
Tighter time synchronization is critical for optimal network and AI workload efficiency. Neural networks, the backbone of deep learning, benefit from synchronized execution across multiple layers to ensure harmonious processing and timely, accurate outputs. Generative AI platforms such as ChatGPT and AI-driven applications in 5G/6G communications, aerospace, defense, augmented and virtual reality, and spatial computing all demand precision timing. Overall, precision timing streamlines data flows, enabling faster and more reliable communication.
One of the key challenges the industry faces is to quantify the impact that synchronization has on key business metrics, such as total cost of ownership. SiTime is working with leading AI companies to clarify this relationship—for example, quantifying how improvements in precision timing can enable faster AI processing and more efficient networks.
MEMS precision timing disrupts traditional paradigms
The transition to precision timing based on microelectromechanical systems (MEMS) technology marks a significant leap from traditional quartz-based timing. MEMS resonators offer unprecedented performance, reliability and resilience, making them ideal for demanding applications, such as AI computing and cloud data centers.
MEMS timing components are compact, energy-efficient and environmentally resilient, providing superior tolerance to temperature fluctuations, shock and vibration. They offer greater frequency stability, which impacts the performance and reliability of AI systems. MEMS technology also makes it easier to custom-tailor precision timing solutions for various applications.
AI benefits from synchronized networks
Time synchronization is crucial for coordinating distributed computing tasks where workloads are spread across multiple nodes in a network. This coordination ensures that GPUs and other computational resources operate at peak efficiency. In distributed computing, tasks are divided among various nodes, each requiring precise synchronization to ensure that data is processed in the correct sequence. Therefore, time synchronization enables distributed computers connected via a common network to behave as if they were a single, large machine.
A network may be synchronized either by distributing time via hardware or packets. Hardware time distribution may be achieved by receiving a GPS signal and distributing it throughout the network. However, packet time distribution is more common as traditionally accomplished using Network Time Protocol (NTP). More recently, Precision Time Protocol (PTP) is gaining popularity for AI training for a variety of reasons, which we’ll explore below.
What are NTP and PTP?
NTP and PTP are the two primary methods for synchronizing networks in AI data centers. Each protocol serves different purposes and is suitable for various applications based on the required level of accuracy and the specific needs of the network.
NTP is widely used in general IT environments, including enterprise networks, where millisecond-level accuracy is sufficient. It is commonly employed in scenarios such as log file management, security protocols and coordinating activities across distributed systems. Ideal for small to medium-sized networks where precise synchronization is not critical, NTP is easy to implement and cost-effective, making it a popular choice for many organizations.
In contrast, PTP is favored in industries that require sub-microsecond- or nanosecond-level accuracy, such as telecommunications, financial services and high-frequency trading. It is crucial in AI data centers, where precise timing is necessary for efficient data processing and minimizing latency. Best suited for large-scale, high-precision environments, PTP’s ability to account for network delays and its support for hardware timestamping make it the preferred choice for applications requiring stringent synchronization.
What is the difference between PTP and NTP?
Benefits of PTP
PTP has the potential to synchronize time within nanoseconds, significantly surpassing NTP’s millisecond-level accuracy. This precision is vital for applications in which even the slightest time uncertainty can lead to significant issues. PTP ensures that all devices within a network are synchronized to a single time source with minimal deviation. This synchronization is critical for maintaining the efficiency and reliability of distributed systems, especially in AI data centers.
Additionally, PTP can be scaled to accommodate large and complex networks. Its source-follower architecture allows for precise synchronization across multiple devices, making it suitable for expanding AI data centers and other large-scale applications.
When to use NTP
NTP is adequate for applications in which microsecond-level accuracy is not required. It provides sufficient synchronization for general IT operations and less time-sensitive tasks. In smaller networks, the implementation of NTP is straightforward and does not require significant investment in specialized hardware or infrastructure.
NTP is also cost-effective and easy to deploy. It does not require the specialized equipment needed for PTP, making it a practical choice for organizations with budget constraints.
MEMS timing for PTP applications
As an example, MEMS-based super temperature-compensated crystal oscillators (Super TCXOs) provide exceptional stability and accuracy for PTP applications. They are designed to withstand environmental changes, ensuring reliable performance in AI data centers. Advanced servo and stack software compliant with IEEE 1588 (PTP) standards enhance the precision of time synchronization. This software is essential for achieving the high accuracy required in AI and other HPC environments.
The technological landscape is evolving, and with it, the demands for time synchronization are becoming more stringent. While NTP has served the industry well, its limitations in accuracy and scalability make it less suitable for modern AI data centers. PTP, with its nanosecond-level precision and robust architecture, is set to become the new standard for time synchronization.
As we continue to advance into the digital age, PTP will play a crucial role in ensuring the efficiency and reliability of AI computing and cloud data centers. Embracing PTP means embracing the future of precise and scalable time synchronization, a necessity for the next generation of technological innovations.
Advancing the digital age through time synchronization
SiTime offers a full range of MEMS-based precision timing solutions for data center and networking equipment applications. SiTime’s Epoch Platform, for example, integrates dual MEMS resonators for superior thermal coupling and rapid temperature tracking. This timing technology offers 40× faster temperature tracking and 30× better vibration immunity compared with quartz resonators. The Epoch Platform’s exceptional stability, nanosecond accuracy and extended holdover make it well-suited for AI computing and cloud data centers.
As AI data centers continue to evolve, precision timing remains at the forefront of this technological revolution. Precision timing not only enhances AI processing efficiency but also supports the broader ecosystem of AI-driven applications. As the digital age advances, the integration of AI computing and cloud data centers will continue to redefine enterprises, industrial automation, transportation and 5G networks with precision timing technology at its core.
About the author
Jeff Gao, vice president of product marketing at SiTime, has over 20 years of experience in the semiconductor and networking/communications industries in the areas of wireless systems, VoIP, biometrics, semiconductor timing and embedded software. Prior to SiTime, Jeff held various product marketing and engineering positions of increasing responsibility with Atmel, Cisco, Vovida Networks and ArrayComm. His current technical interests include high-precision timing and synchronization in 5G, data center, optical transports and next-gen industrial applications. Jeff earned his MBA from the University of California, Berkeley and MSEE from the University of Wisconsin–Madison.