F The Cisco AI Pod Explained - The Network DNA: Networking, Cloud, and Security Technology Blog

The Cisco AI Pod Explained

The Cisco AI Pod Explained : Cisco Secure AI Factory

The Cisco AI Pod Explained : Cisco Secure AI Factory


In the rapidly evolving landscape of artificial intelligence, the infrastructure supporting these powerful technologies is paramount. Cisco's AI pod represents a fundamental building block for the company's Secure AI Factory, offering a comprehensive suite of hardware solutions designed to handle the demanding requirements of AI workloads. This article delves into the components of the Cisco AI pod, detailing each element and its role in constructing a robust and secure AI environment.

Understanding the Cisco AI Pod

The Cisco AI pod is conceptualized as the foundational unit for Cisco's broader Secure AI Factory initiative. It comprises a collection of specialized hardware designed to provide the necessary compute, storage, networking, and security capabilities for AI applications. The pod's modular design allows for scalability, enabling organizations to start with smaller deployments and expand as their AI needs grow.

Core Components of the Cisco AI Pod

The AI pod is a layered architecture, with each component contributing to the overall functionality. Starting from the foundational compute elements and moving up to networking and security, here's a breakdown of the key pieces:

1. Compute Infrastructure

At the heart of any AI deployment are the powerful compute resources required for training and inferencing. Cisco offers several solutions within the AI pod for this purpose:

a. Cisco C880A: The Foundation of GPU Power

The Cisco C880A serves as a fundamental component, providing significant GPU processing power. Key features include:

  • 8-way GPU Configuration: This unit is an 8-way GPU box, indicating it can house up to eight powerful graphics processing units.
  • HGX Architecture: It is based on Nvidia's HGX architecture, a high-performance computing platform designed for AI and HPC workloads.
  • All-in-One Form Factor: The C880A integrates storage, networking, and compute capabilities within a single, consolidated unit, simplifying deployment and management.

b. Cisco UCS X Series with the 9508 Blade Server

The Cisco UCS X Series, enhanced with the newly introduced 9508 blade, brings high-performance computing to a modular blade architecture. This solution offers:

  • Double-Wide Blade: The 9508 blade is a double-wide unit, allowing for more substantial component integration.
  • Multiple RTX Pros: It can accommodate up to four RTX Pro GPUs, offering substantial processing power for AI tasks.
  • Integrated Compute Nodes: Alongside the GPU blade, two compute nodes can be placed next to it within the chassis.
  • Blackwell RTX Pro Series Support: This represents a significant advancement, enabling the use of the latest Blackwell RTX Pro series servers within a blade architecture for the first time.
  • Power and Cooling Efficiency: The design prioritizes power efficiency and effective cooling, critical for high-density GPU deployments.

c. Modular MGX Architecture: Cisco 8C 845A

For customers looking for flexibility and scalability in their AI deployments, the modular MGX architecture, specifically the Cisco 8C 845A, is an ideal choice. Its advantages include:

  • Scalable GPU Configuration: Customers can start with as few as two GPUs and scale up to four, six, or even eight GPUs within this form factor.
  • NVMe Storage Integration: The 8C 845A includes integrated NVMe storage, providing fast access to data for AI workloads.
  • Grow-as-You-Go Approach: This modular design is particularly beneficial for customers who want to gradually introduce AI applications into production and scale their infrastructure accordingly.

2. Storage Solutions

Efficient and high-performance storage is crucial for feeding data to AI models. Cisco's AI pod incorporates advanced storage capabilities:

a. Cisco 225 M8: Modular Storage Server

The Cisco 225 M8 is a modular rack server designed specifically for storage needs within the AI pod. Its integration with third-party solutions enables massive storage capacity:

  • Modular Form Factor: It's a rack server with a modular design, allowing for flexibility in deployment.
  • Partnership with Vast Storage: Through a new partnership with Vast Storage, 11 of these 225 M8 units can be deployed together.
  • Petabyte-Scale Storage: This combination can deliver a petabyte of storage, providing the vast capacity required for large AI datasets.

b. MDS Storage for Connectivity

Connecting diverse storage systems to the AI infrastructure is handled by Cisco's MDS storage solutions. The hyper fabric switch is a key innovation in this area:

  • Connecting Traditional Storage: MDS switches facilitate the connection of traditional storage arrays from vendors like Pure, NetApp, and Hitachi to the rest of the infrastructure.
  • Hyper Fabric Switch: This newly introduced switch is optimized for modern data centers.
  • Cloud-Native Microservices: It's particularly well-suited for handling east-west traffic in data centers that are built around cloud-native microservices.

3. Networking Infrastructure

High-speed and low-latency networking is indispensable for the efficient movement of data between compute nodes, storage, and other network components in an AI environment. Cisco's AI pod includes advanced switching solutions:

a. High-Speed Switches

The AI pod features cutting-edge networking switches to meet demanding bandwidth requirements:

b. Cisco ASICs (Application-Specific Integrated Circuits)

Cisco's commitment to developing its own silicon is evident in its networking solutions. The AS6, or silicon one chip, is an example of this internal development, powering some of their high-performance switches.

c. Spectrum X Switch with Nvidia Silicon

Further enhancing the networking capabilities, the Spectrum X switch, recently introduced at NVIDIA GTC, integrates Nvidia silicon. This collaboration aims to further optimize performance for AI workloads.

d. Smart Switch with In-Rack Security and Networking

A significant innovation highlighted is the smart switch, which integrates security and networking directly into the top-of-rack switch. This allows for:

  • Policy Enforcement: It enables policy enforcement at the very edge of the network within the rack.
  • Filtering: Direct filtering capabilities are also integrated into the switch.
  • Top of Rack Functionality: This means critical network and security functions are performed directly at the top of the rack, streamlining management and improving efficiency.

4. Security and Management

Security is a core tenet of the Cisco AI Factory, and the AI pod includes components that ensure a protected environment:

a. Traditional Managed Storage Switch

In addition to the advanced hyper fabric switches, a traditional managed storage switch is also part of the pod, ensuring compatibility and management of various storage network configurations.

b. FPM 3100 Series Firewall

The FPM 3100 series firewall provides essential security for the AI pod, acting as a critical line of defense to protect data and infrastructure from unauthorized access and threats.

The AI Pod as a Building Block

"These are really the building blocks that you need to create an AI pod," the speaker emphasizes, highlighting the modular and comprehensive nature of the setup. The AI pod itself is then defined as "the building block of the secure AI factory." This layered approach underscores Cisco's strategy of providing scalable, integrated, and secure infrastructure solutions tailored for the demanding world of artificial intelligence.