Site icon mulcas

Understanding Single Root and Dual Root GPU Server Configurations

mulcas.com - GPU Single Root vs Dual Root

GPU-accelerated servers have revolutionized high-performance computing (HPC) and artificial intelligence/machine learning (AI/ML) applications. While various manufacturers offer GPU server configurations, two primary setups dominate the landscape: Single Root and Dual Root.

GPU Server Architecture Fundamentals

Modern GPU servers typically employ a dual-CPU design, where each processor is coupled with its own DRAM memory. These CPUs communicate through high-speed interconnects, such as Intel's Ultra Path Interconnect (UPI) or AMD's Infinity Fabric (xGMI), ensuring efficient data transfer between processors.

There are two primary approaches to integrating GPUs into these systems:

  1. PCIe-based Systems: 
  1. SXM/OAM Designs:

Within the realm of PCIe-based systems, two configurations have emerged as dominant: Single Root and Dual Root. These setups differ in how they manage the communication between CPUs and GPUs, each offering distinct advantages for various workloads.

Single Root Configuration

How it works: In a Single Root configuration, all GPUs are connected to a single CPU, even in systems with multiple CPUs. This setup utilizes a PCI switch (PLX) to manage communications between the CPU and GPUs. The designated CPU connects to the PLX switch via two PCIe x16 lanes, which then facilitates communication with the GPUs.

The Single Root design centralizes GPU management, which can be advantageous for applications that require direct access to all GPUs simultaneously. However, this configuration may have limitations in scenarios where peer-to-peer GPU communication is critical, as all data must pass through the single CPU and PLX switch.

Key features:

Best for:

Dual Root Configuration

How it works: In a Dual Root setup, the GPUs are distributed between two CPUs. Each CPU connects to its assigned GPUs through its own PLX switch. This configuration allows for more balanced distribution of computational resources and enables simultaneous transactions on both CPU-GPU pathways.

The Dual Root architecture creates two separate PCIe root complexes, one for each CPU-GPU group. This design can offer better performance for workloads that require frequent CPU-GPU interaction, as it provides more efficient CPU-memory to GPU-memory performance pathways. The distribution of GPUs between CPUs can be flexible based on specific workload requirements.

In some high-performance computing deployments, dual-root systems may include two Infiniband cards, one attached to each CPU. This configuration helps mitigate potential performance impacts and ensures balanced communication across the system, especially in scenarios where data needs to be shared efficiently between the two CPUs.

Key features:

Best for:

Direct Attached Configuration

As an alternative to Single and Dual Root setups, some systems use a Direct Attached configuration. In this arrangement, each CPU has direct PCIe access to up to four full-size GPUs, eliminating the need for PLX switches. This allows for up to eight GPUs in a dual-CPU system. Direct Attached setups can reduce latencies between CPUs and GPUs, which is beneficial in HPC applications where minimizing latency is crucial. However, this configuration typically supports fewer total GPUs compared to setups using PLX switches.

Key features of Direct Attached:

Conclusion

Choosing between Single Root and Dual Root GPU server configurations depends on specific workload requirements, power constraints, and performance needs. Single Root offers simplicity, power efficiency, and centralized GPU management for GPU-centric tasks. Dual Root provides enhanced performance and flexibility for applications requiring balanced CPU-GPU interaction, at the cost of higher power consumption.

When planning your GPU server infrastructure, carefully consider your current and future needs, as well as the specific requirements of your applications. It's also worth consulting with hardware vendors and benchmarking your specific workloads to determine the optimal configuration for your use case.

Remember that while Single Root and Dual Root are the most common configurations, Direct Attached setups can offer advantages in specific scenarios where low latency is crucial and fewer GPUs are needed.

Resources:

STH: Testing and Benchmark

Exit mobile version