“The Open Compute Project (OCP) Global Summit is back this year as an in-person event. Together with some 3,500 other eager attendees, I set out to the San Jose Convention Center on October 18 for three days of learning about the latest design practices in the server rack for hyperscale cloud, edge computing, and enterprise environments. Seeing the equipment firsthand while catching up with familiar people in person provided an experience that an online event could not even come close to matching.
As an analyst responsible for research on servers and server connectivity, I was especially interested in updates related to the latest in server platforms featuring Compute Express Link (CXL), server-to-network connectivity to enable the transition to 100 Gbps SerDes, and server form factors based on flexible modular designs. These advances will enable the development of new server and rack architectures that are more efficient and sustainable. Among all the hot topics that were presented during the Summit, however, I found CXL to be the most potentially disruptive, paving the way for greater efficiencies for scale-out computing, as well as accelerated computing.
CXL is a new server interconnect standard that maintains memory coherency between the central processing unit (CPU) memory space and memory on attached devices over the PCI Express physical layer. As a result, memory between the CPU and other co-processors within the server and rack can be shared, enabling data-intensive applications such as artificial intelligence (AI) to access memory more efficiently and at lower latencies. As the demand for applications involving large AI training models and memory data computing continues to grow, CXL represents the ideal solution for meeting the greater demands for memory capacity and bandwidth with higher utilization.
The upcoming CPU platform updates will be significant, as CXL will be supported for the first time, leading to innovations in server architectures. Some additional CXL-related developments are as follows:
- Intel’s Sapphire Rapids and AMD’s Genoa CPU platform, both of which are shipping in volume in early 2023, will support CXL. Future release of Ampere Computing’s ARM CPU is also expected to support CXL.
- CXL will continue to evolve. Vendors are planning to launch products based on the latest CXL 2.0, which supports rack-scale solutions and is based on PCIe 5.0, with transfer speeds of up to 32 Gbps for the upcoming platform refreshes. Specifications for CXL 3.0, which have recently been ratified, will support scalability beyond the rack and based on PCIe 6.0, with transfer speeds of up to 64 Gbps.
- The OCP Summit saw an emphasis on modular server form factors, namely those with DC-MHS (Data Center Modular Hardware System). Both Intel and Google proposed a reference design in which the key components of the servers—such as the host CPU, accelerator, memory, network connectivity, and storage—are disaggregated into distinct modules. Modularity would facilitate an infrastructure that is right-sized for a specific application, while streamlining the life-cycle management aspects. CXL will be the appropriate interface for interconnecting these various components with high coherency.
A dedicated CXL forum presented the latest products that leverage CXL. Some notable CXL-related announcements from the OCP Summit included the following:
- Meta announced its intentions to incorporate CXL into future server designs, especially for memory-intensive AI applications running in accelerated computing platforms. The company is planning to boost its data center investments by more than 60% this year, with a heavy emphasis on its accelerated computing infrastructure in order to increase engagement on its social media platforms and to lay the foundation for the metaverse. CXL would enable more advanced memory systems that could share memory across various hosts within the network, effectively improving memory utilization, as well as enabling asynchronous sharing of data and results over multiple hosts. The company also proposed the tiering of memory based on applications. For instance, applications such as caching that demand the lowest latency, can use native memory (residing next to the CPU) for “hot” memory pages. This, in contrast with less latency-intensive applications, such as data warehousing, can use CXL memory (riding in PCIe expander cards) for “cold” memory pages as latency, as native memory tends to have 2X better latency than CXL memory. This hierarchy of memory allocation, which can utilize total system memory more effectively, would be beneficial for any accelerated computing platform.
- Astera Labs announced a PCIe-based CXL expander card that would greatly increase the memory capacity of any server. Without CXL, bandwidth-hungry applications are already limited in the number of memory channels supported by the CPU, resulting in memory not being fully utilized. CXL expansion cards have the capability to as much as double the number of memory capacity without hitting the limits in memory channels, while meeting the latency requirements of most applications. While it is not yet clear what kind of applications could utilize up to 8 TB of system memory (4 TB of native memory and 4 TB of CXL memory), it is reassuring that future applications will not be memory-constrained.
- Other chip and system vendors are standing behind CXL. Marvell has plans to release processors that can support CXL expansion cards, as well as CXL pooling and switching, to enhance the composability of future server designs. Samsung will be releasing a Smart solid-state drive (SSD), which features cache-coherent memory enabled by CXL, to dramatically increase read-and-write speeds to on-board NAND (storage.
I look forward to learning more product announcements surrounding CXL, as next-generation server platforms ship in volume next year. I believe that OCP will continue to help provide the kind of standardization the industry so very much needs to accelerate CXL adoption.” Baron Fung, Research Director, joined Dell’Oro Group