Leaf-Spine Architecture Explained for Modern Data Centers

Home › Data Center › Leaf-Spine Architecture Explained

Why the three-tier network is dead, how Clos fabric works, ECMP and oversubscription explained, VXLAN/EVPN overlay design, BGP underlay, and a full vendor comparison for Cisco ACI, NX-OS, and Arista EOS

By Route XP | Published: March 2026 | Updated: March 2026 | Data Center, Cisco ACI, Arista

Leaf-Spine Clos Fabric — The Standard Architecture for Modern Data Center Networks

2 Network Tiers (vs 3 in Legacy Design)

≤2 Hops Between Any Two Servers

ECMP Equal-Cost Multi-Path Load Balancing

1966 Year Charles Clos Invented the Fabric

VXLAN Standard Overlay for L2 Extension

16M+ VXLAN Segment IDs vs 4,094 VLANs

📋 Table of Contents

Why Three-Tier Architecture Failed the Modern Data Center
The Clos Network: Mathematical Foundation
Leaf-Spine: How It Works
ECMP: Equal-Cost Multi-Path Load Balancing
Oversubscription Ratios Explained
BGP Underlay Design
VXLAN/EVPN Overlay: L2 and L3 Over IP
Vendor Solutions: Cisco ACI, NX-OS, Arista EOS
Scaling Leaf-Spine: Super-Spine and Multi-Pod
Design Decision Guide
Frequently Asked Questions

1. Why Three-Tier Architecture Failed the Modern Data Center

For the better part of two decades, enterprise data centers were built on a three-tier hierarchical model: a Core layer providing high-speed routing between aggregation blocks, an Aggregation (or Distribution) layer connecting groups of access switches and hosting Layer 3 boundaries, and an Access layer connecting servers. This design was inherited from campus networking and worked well — until the nature of data center traffic fundamentally changed.

The three-tier model was optimized for north-south traffic — client machines outside the data center communicating with servers inside it. Traffic entered through the core, was distributed by the aggregation layer, and reached servers at the access layer. The bandwidth requirement at the top of the hierarchy was a fraction of the total server bandwidth, because individual users were the bottleneck.

Three-Tier vs Leaf-Spine: Core Limitations Resolved
Problem in Three-Tier	Root Cause	Leaf-Spine Solution
East-west bandwidth bottleneck	Server-to-server traffic hairpins through aggregation and core layers	Every server pair is exactly 2 hops apart; full bisection bandwidth available
Spanning Tree Protocol (STP) blocking	Redundant uplinks must be blocked by STP to prevent loops — wasting 50% of link capacity	No STP between leaf and spine — all paths active via ECMP routing
Limited scalability	Core and aggregation are complex, large chassis — adding capacity requires forklift upgrades	Add a leaf switch to add server capacity; add a spine to add bandwidth — independently
Unpredictable latency	Variable hop count between servers depending on their physical location in the hierarchy	Uniform latency — every server-to-server path is always exactly 2 hops
VLAN spanning complexity	Extending VLANs across aggregation blocks requires careful STP tuning and VTP management	VXLAN overlay decouples L2 domains from physical topology entirely

📌 The Traffic Pattern Shift Studies of hyperscale data center traffic consistently show that 75–80% of data center traffic is east-west in modern virtualized and cloud-native environments. Legacy three-tier architecture was designed when that ratio was reversed. Leaf-spine is purpose-built for the east-west world.

2. The Clos Network: Mathematical Foundation

The leaf-spine architecture is a direct application of the Clos network — a multistage switching topology first described by Bell Labs engineer Charles Clos in 1953 for telephone switching systems. Clos proved mathematically that a multistage network of smaller switches could achieve the same non-blocking performance as a single crossbar switch at a fraction of the hardware cost. The insight applies equally to data center Ethernet switching six decades later.

A Clos network is defined by three parameters: n (the number of inputs/outputs per ingress stage switch), m (the number of middle-stage switches), and r (the number of ingress/egress stage switches). For a network to be strictly non-blocking — meaning any unused input can be connected to any unused output without rearranging existing connections — Clos proved that m ≥ 2n − 1.

In the context of a data center leaf-spine fabric, this translates directly to the hardware design rule: the number of spine switches must be at least equal to the number of uplinks on each leaf switch. A leaf switch with 16 uplink ports requires at least 16 spine switches for a non-blocking fabric. In practice, most data center deployments accept a degree of oversubscription (discussed in Section 5) and deploy fewer spines accordingly.

💡 Fat-Tree vs Leaf-Spine You will encounter both terms in data center literature. A fat-tree is a specific Clos topology where each switch has equal numbers of uplinks and downlinks at every tier — ensuring a truly non-blocking fabric. Leaf-spine is the general term for a two-tier Clos fabric where the uplink-to-downlink ratio may vary. All fat-trees are leaf-spine topologies, but not all leaf-spine topologies are fat-trees.

3. Leaf-Spine: How It Works

The leaf-spine fabric consists of exactly two switch tiers with a rigid connectivity rule: every leaf connects to every spine, and no leaf connects to another leaf, and no spine connects to another spine. This constraint is what gives the fabric its predictable, uniform performance characteristics.

Leaf Switches — The Server-Facing Tier

Leaf switches (also called Top-of-Rack or ToR switches in physical deployments) are the access layer. Every server, hypervisor host, storage node, and network service appliance (firewall, load balancer) connects to a leaf switch. Leaf switches have two categories of ports:

Downlinks: Face the servers. Typically 1G, 10G, 25G, or 100G depending on server generation. In a 48-port leaf switch, 32–48 ports are typically downlinks
Uplinks: Face the spine switches. Higher speed than downlinks — commonly 100G or 400G. In a standard 48×25G + 8×100G leaf, the 8 ports in the rightmost group are the spine uplinks, one per spine switch

A key rule: the Layer 3 boundary lives on the leaf switch in a modern leaf-spine design. Each leaf is a router, not just a switch. Servers in the same subnet communicate via the leaf switch's local forwarding table. Servers in different subnets communicate via the IP fabric between leaf switches (the spine tier). This is the opposite of the three-tier model, where Layer 3 boundaries lived at the aggregation layer.

Spine Switches — The Interconnect Tier

Spine switches have a single function: forward packets between leaf switches as fast as possible. They have no server-facing ports — only leaf-facing ports. Every spine connects to every leaf with exactly one link. Spines are typically high-radix, high-speed devices — 64-port 100G or 32-port 400G switches are common spine candidates. Because all paths between any two leaves are equal cost, spines are fully utilized by ECMP hashing.

The Two-Hop Guarantee

The defining characteristic of leaf-spine is the two-hop guarantee: traffic from any server to any other server traverses exactly two switch hops — leaf → spine → leaf. There is no exception and no variation. This predictable latency profile is critical for latency-sensitive workloads like trading systems, real-time analytics, and AI training clusters where jitter between GPU collective operations must be minimized.

# Traffic flow: Server-A (Leaf-1) → Server-B (Leaf-4)

Server-A ──> Leaf-1 ──> Spine-2 (ECMP selected) ──> Leaf-4 ──> Server-B
Hop 1 (L3 route) Hop 2 (L3 route)

Total hops: 2 — regardless of fabric size or server location

4. ECMP: Equal-Cost Multi-Path Load Balancing

ECMP is the mechanism that makes all spine uplinks simultaneously active in a leaf-spine fabric. In the three-tier model, STP blocked redundant links to prevent loops. In leaf-spine, there are no Layer 2 loops — all inter-switch links are Layer 3 routed interfaces. Routing protocols (BGP or OSPF) install multiple equal-cost routes to every destination prefix, one via each spine switch. The router hardware distributes flows across all equal-cost paths simultaneously.

ECMP Hashing: How Flows Are Distributed

ECMP does not split individual packets across multiple paths (that would cause reordering). Instead, it hashes each flow to a specific path and sends all packets within that flow on the same path. The hash input is typically the 5-tuple: source IP, destination IP, source port, destination port, and IP protocol. This ensures that packets belonging to a single TCP session always follow the same path — preserving order — while different flows between the same server pair can use different spines.

# Cisco NX-OS — verify ECMP path distribution
Leaf-1# show ip route 10.10.20.0/24

10.10.20.0/24, ubest/mbest: 4/0
    *via 10.0.0.1, Eth1/49, [20/0], 02:14:03, bgp-65001, external
    *via 10.0.0.3, Eth1/50, [20/0], 02:14:03, bgp-65001, external
    *via 10.0.0.5, Eth1/51, [20/0], 02:14:03, bgp-65001, external
    *via 10.0.0.7, Eth1/52, [20/0], 02:14:03, bgp-65001, external

# 4 equal-cost paths via 4 spine switches — all active simultaneously

ECMP Path Count and Bandwidth Scaling

The effective server-to-server bandwidth scales linearly with the number of spines. A leaf switch with 4 × 100G uplinks to 4 spines has an effective 400G of uplink bandwidth available (across all concurrent flows). Adding two more spines and two more uplinks brings this to 600G — without replacing any existing hardware. This is the elastic horizontal scaling that makes leaf-spine ideal for cloud-native data centers.

ECMP Bandwidth Scaling with Spine Count
Spine Count	Uplink Speed	Total Uplink Bandwidth / Leaf	Max Inter-Leaf Throughput
2 spines	100G	200G	200 Gbps (2 × 100G)
4 spines	100G	400G	400 Gbps (4 × 100G)
8 spines	100G	800G	800 Gbps (8 × 100G)
4 spines	400G	1.6 Tbps	1.6 Tbps (4 × 400G) — AI fabric standard

⚠️ ECMP Hash Polarization Standard 5-tuple ECMP can suffer from hash polarization when traffic patterns are not uniformly distributed across flows — for example, a few very large "elephant flows" between the same IP pairs always land on the same spine. Solutions include using flowlet-based ECMP (redistributes after flow gaps), enabling asymmetric hashing (randomized per-switch seed), or implementing adaptive routing on capable switch ASICs. See the AI Data Center Networking article for a deep-dive on adaptive routing.

5. Oversubscription Ratios Explained

Oversubscription is the ratio of total downlink (server-facing) bandwidth to total uplink (spine-facing) bandwidth on a leaf switch. A 3:1 oversubscription ratio means you have three times more server bandwidth than spine bandwidth — acceptable because not all servers transmit at line rate simultaneously in typical workloads.

Choosing the right oversubscription ratio for your workload is one of the most critical and nuanced leaf-spine design decisions. The right answer depends on traffic characteristics, workload type, and budget constraints.

Oversubscription Ratios — Design Guide by Workload Type
Ratio	Example Config (48×25G + 8×100G leaf)	Suitable For	Not Suitable For
1:1 (Non-blocking)	Equal downlink and uplink bandwidth — all uplink ports used	AI/HPC training clusters, financial trading, real-time analytics	Cost-sensitive general-purpose DC — expensive in spine switches
2:1	48×25G down (1,200G) / 8×100G up (800G)	Private cloud, virtualized compute, distributed databases	Workloads with sustained all-to-all communication bursts
3:1	48×25G down (1,200G) / 4×100G up (400G)	Web serving, app tier, mixed general-purpose workloads	Storage-heavy workloads with high sustained east-west IO
4:1 or higher	48×1G down (48G) / 2×100G up (200G)	Dev/test, low-utilization workloads, lab environments	Any production workload — introduces congestion risk

A practical formula for leaf oversubscription: Oversubscription = (Number of server ports × server port speed) ÷ (Number of spine uplinks × uplink speed). For a 48×25G leaf with 6×100G uplinks: (48 × 25G) / (6 × 100G) = 1,200G / 600G = 2:1.

✅ Industry Standard: 3:1 for General-Purpose, 1:1 for AI Most enterprise data centers deploy a 3:1 oversubscription ratio as the general-purpose standard — validated by decades of traffic analysis showing typical servers utilize 20–40% of their NIC bandwidth on average. AI GPU clusters have fundamentally different traffic patterns (near-line-rate All Reduce operations) and require 1:1 non-blocking fabric at the leaf tier. Getting this wrong is the single most common cause of GPU training underperformance.

6. BGP Underlay Design

The underlay is the physical IP routing fabric — the protocol and addressing scheme that carries packets between leaf and spine switches. While OSPF was common in early leaf-spine deployments, BGP (specifically eBGP) has become the de facto standard underlay protocol for modern data centers, recommended by RFC 7938 ("Use of BGP for Routing in Large-Scale Data Centers").

Why eBGP for Underlay?

Simpler failure isolation: Each leaf-spine link is its own /31 (or /30) point-to-point subnet and its own BGP session. A link failure affects only that session and that peer — it does not trigger a fabric-wide SPF recalculation as OSPF would
No flooding domain: Unlike OSPF LSA flooding, BGP only sends route updates when something changes. This scales far better in large fabrics with thousands of prefixes
Policy flexibility: BGP route policies (communities, local preference, MED) allow fine-grained traffic engineering across the fabric without complex OSPF metric tuning
AS number design: Each leaf gets a unique private AS number (from the 4-byte private range 4200000000–4294967294). All spines share a common AS number. This creates natural eBGP peering at every leaf-spine link and prevents BGP AS path looping between leaf switches

# Cisco NX-OS — Leaf BGP underlay configuration
Leaf-1(config)# router bgp 4200000001
Leaf-1(config-router)# router-id 10.0.0.1
Leaf-1(config-router)# address-family ipv4 unicast
Leaf-1(config-router-af)# maximum-paths 8  # Enable ECMP across 8 spines
Leaf-1(config-router-af)# exit
Leaf-1(config-router)# neighbor 10.0.0.2 remote-as 4200000100  # Spine-1 AS
Leaf-1(config-router)# neighbor 10.0.0.4 remote-as 4200000100  # Spine-2 AS
Leaf-1(config-router)# neighbor 10.0.0.6 remote-as 4200000100  # Spine-3 AS
Leaf-1(config-router)# neighbor 10.0.0.8 remote-as 4200000100  # Spine-4 AS

📌 Loopback-Based BGP for VTEP Reachability Each leaf switch advertises its loopback IP address into the BGP underlay. This loopback IP becomes the VTEP (VXLAN Tunnel Endpoint) address — the source and destination of VXLAN-encapsulated packets in the overlay. Spine switches do not need loopback advertisement; their only job is to forward VTEP reachability between leaves. Always use a dedicated /32 loopback per leaf for VTEP, distinct from the router-id loopback.

7. VXLAN/EVPN Overlay: L2 and L3 Over IP

The BGP underlay provides IP reachability between leaf switches. But data center workloads also need Layer 2 connectivity — virtual machines that must stay in the same subnet as they vMotion between hosts, or containers that share a flat L2 broadcast domain. In the three-tier model, VLANs extended across trunks provided this. In leaf-spine, VXLAN (Virtual Extensible LAN) provides it — encapsulating original L2 Ethernet frames inside UDP packets that travel over the L3 underlay.

VXLAN Encapsulation

A VXLAN-encapsulated frame adds four headers to the original Ethernet frame:

[Outer Ethernet] [Outer IP: VTEP-src → VTEP-dst] [UDP dst:4789] [VXLAN Header: VNI] [Inner Ethernet Frame]

# VNI (VXLAN Network Identifier) = 24-bit segment ID (up to 16M segments)
# Each VNI maps to a VLAN or VRF on the leaf switches
# Outer IP header = routable across the spine underlay

EVPN: The Control Plane for VXLAN

VXLAN by itself is just the data plane encapsulation. It needs a control plane to distribute MAC and IP address information between VTEPs — otherwise leaf switches would have to flood unknown unicast traffic to discover remote MACs. BGP EVPN (Ethernet VPN — RFC 7432) provides this control plane, using BGP to distribute MAC/IP bindings between all leaf switches.

EVPN uses four primary route types for VXLAN fabrics:

BGP EVPN Route Types for VXLAN Fabrics
Route Type	Name	Purpose
Type 2	MAC/IP Advertisement	Distributes MAC address and optionally the associated IP address of a host, eliminating unknown unicast flooding and enabling distributed ARP suppression
Type 3	Inclusive Multicast Route	Advertises each VTEP's membership in a VNI — allows BUM (Broadcast, Unknown Unicast, Multicast) traffic to be sent to all VTEPs participating in that VNI via ingress replication
Type 5	IP Prefix Route	Distributes IP prefixes (subnets) between VTEPs for inter-VRF and external routing — enables symmetric IRB (Integrated Routing and Bridging)
Type 1	Ethernet Auto-Discovery	Used for multihoming (ESI — Ethernet Segment Identifier) — enables dual-homed servers to use both leaf connections simultaneously with active-active EVPN multihoming

Symmetric vs Asymmetric IRB

Integrated Routing and Bridging (IRB) is the mechanism for routing between L2 VNIs (subnets) on the leaf-spine fabric. Two models exist:

Asymmetric IRB: Both L2 bridging and L3 routing happen at the ingress leaf. The egress leaf only performs L2 bridging. Simpler to configure but requires every leaf to have every VNI instantiated — limiting fabric scale and increasing MAC table size
Symmetric IRB: The ingress leaf routes into an L3 VNI (a routed VRF tunnel), the spine forwards the L3 VNI, and the egress leaf routes from the L3 VNI to the destination L2 VNI. Only requires each leaf to have the VNIs of servers it directly hosts, plus a common L3 VNI per VRF — scales to much larger fabrics. Requires EVPN Type 5 routes and is the recommended model for any fabric beyond ~50 leaf switches

8. Vendor Solutions: Cisco ACI, NX-OS, and Arista EOS

Three solutions dominate enterprise leaf-spine deployments, each representing a distinct philosophy on how the fabric should be built and managed.

Leaf-Spine Vendor Comparison: Cisco ACI vs Cisco NX-OS VXLAN/EVPN vs Arista EOS
Attribute	Cisco ACI	Cisco NX-OS VXLAN/EVPN	Arista EOS VXLAN/EVPN
Control plane	Proprietary OpFlex policy model via APIC controller	Standard BGP EVPN (RFC 7432)	Standard BGP EVPN (RFC 7432)
Management model	Centralised — APIC controller is required; all policy via GUI/REST API/Ansible	Distributed — per-device CLI, Nexus Dashboard optional	Distributed — per-device CLI + CloudVision (CVP) optional
Policy abstraction	Excellent — EPG/Contract model abstracts L2/L3 from policy intent	Standard — VRF/VLAN/VNI configured per-device	Standard — VRF/VLAN/VNI configured per-device
Hardware flexibility	ACI-specific Nexus hardware only (9000 series in ACI mode)	Nexus 9000, 7000 series in NX-OS standalone mode	Full Arista 7000 series; SONiC support on white-box
Automation / IaC	Strong — APIC REST API, Terraform hashicorp/aci, Ansible aci_tenant modules	Strong — NXAPI, Ansible cisco.nxos, Terraform CiscoDevNet/nxos	Excellent — eAPI, Ansible arista.eos, CVP, Terraform, strong NetDevOps community
Learning curve	High — ACI policy model is fundamentally different from traditional networking	Moderate — familiar NX-OS CLI with VXLAN/EVPN additions	Moderate — Linux-like EOS CLI familiar to multi-vendor engineers
Best for	Large enterprises needing policy automation, microsegmentation at scale, and multi-tenancy	Cisco-invested enterprises that want standard EVPN without ACI complexity	Cloud-native, DevOps-forward environments; hyperscale-style builds; OpenConfig/gNMI

9. Scaling Leaf-Spine: Super-Spine and Multi-Pod

A standard two-tier leaf-spine fabric is limited by the port count of the spine switches. A 64-port spine switch can support a maximum of 64 leaf switches. A 64-port 400G spine with 64 leaves each hosting 48 servers supports 3,072 servers maximum — sufficient for many enterprise data centers, but insufficient for large cloud or hyperscale deployments.

Super-Spine: Three-Tier Clos

The natural extension is a Super-Spine (or Core) tier — a third tier of switches that interconnects multiple leaf-spine pods. Each pod is a complete leaf-spine fabric. Super-spines connect pod spines to each other, forming a three-tier Clos topology. Traffic between servers in the same pod still traverses only 2 hops. Traffic between pods traverses 4 hops (leaf → spine → super-spine → spine → leaf). Scale increases to hundreds of thousands of ports.

Cisco ACI Multi-Pod and Multi-Site

Cisco ACI has two native scaling constructs beyond a single pod. ACI Multi-Pod extends a single ACI fabric across multiple physical pods connected via an Inter-Pod Network (IPN) — all pods share a single APIC cluster and a unified policy domain. ACI Multi-Site connects geographically separate ACI fabrics (each with its own APIC cluster) via a Nexus Dashboard Orchestrator, enabling stretched L2/L3 domains between data centers with separate fault domains.

Leaf-Spine Scale-Out Options
Approach	Max Scale	Hop Count	Use Case
Standard leaf-spine (2-tier)	~3,000–5,000 servers	2 hops (always)	Single DC, enterprise or mid-size cloud
Super-Spine (3-tier Clos)	50,000–100,000+ servers	2 (intra-pod) / 4 (inter-pod)	Large cloud, hyperscale DC, AI GPU clusters
ACI Multi-Pod	Up to 6 pods, ~1,000 leaves	2 (intra-pod) / 4 (inter-pod via IPN)	Campus data center, DC expansion, HA across rooms
ACI Multi-Site	Up to 12 sites	2–6 hops (intra and inter-site)	Geo-distributed DC, active-active DCI, DR

10. Design Decision Guide

With the foundational concepts in place, the table below maps common data center profiles to the recommended leaf-spine design choices — from oversubscription ratio and uplink speed to control plane and overlay model.

Leaf-Spine Design Decision Matrix by Data Center Profile
DC Profile	Oversubscription	Uplink Speed	Underlay	Overlay	Recommended Platform
Small enterprise (≤500 servers)	3:1	100G	OSPF or eBGP	VXLAN/EVPN asymmetric IRB	Cisco Nexus 93180 leaf / 9336 spine
Mid enterprise / private cloud	2:1–3:1	100G	eBGP	VXLAN/EVPN symmetric IRB	Cisco ACI or NX-OS EVPN; Arista 7050X3
Large enterprise / multi-tenant	2:1	400G	eBGP	VXLAN/EVPN symmetric IRB + Type 5	Cisco ACI Multi-Pod; Arista 7800R series
AI / GPU cluster	1:1 (non-blocking)	400G / 800G	eBGP	RoCEv2 lossless (PFC + ECN + DCQCN)	Cisco G300, Arista 7800R4, Nvidia Spectrum-4
Hyperscale / cloud	1:1–2:1 per tier	400G / 800G	eBGP / SONiC	VXLAN/EVPN or SR-MPLS	Whitebox + SONiC; Arista 7800R4; Broadcom TH5

✅ The Four Non-Negotiable Leaf-Spine Design Rules (1) Every leaf connects to every spine — no exceptions, or the fabric is no longer non-blocking by design. (2) No leaf-to-leaf or spine-to-spine links — these introduce topology asymmetry and routing loops. (3) Layer 3 boundary on the leaf — never at the spine; spines are pure IP forwarders. (4) Document your oversubscription ratio and validate it against your actual traffic profile — a 3:1 ratio in an AI cluster is a guaranteed performance disaster.

11. Frequently Asked Questions

Q: Can I migrate from a three-tier design to leaf-spine without a full forklift upgrade?

Yes — and this is how most production migrations happen. The standard approach is to deploy new leaf-spine pods alongside the existing three-tier fabric, then progressively migrate workloads by rack or by application tier. The existing core/aggregation switches temporarily act as the "border leaf" connecting the old and new fabrics via BGP or static routing. Full cutover typically takes 6–18 months for a medium-sized enterprise data center.

Q: Does leaf-spine require VXLAN? Can I run it with plain VLANs?

You can run a leaf-spine fabric without VXLAN if all inter-leaf communication is pure Layer 3 (routed) and you do not need to extend Layer 2 domains between racks. For Kubernetes deployments using Calico BGP (pure L3 pod routing), plain leaf-spine with eBGP and no overlay is perfectly viable and operationally simpler. VXLAN/EVPN is needed when VMs require L2 adjacency across racks, for workload mobility between leaves, or for multi-tenancy isolation.

Q: How many spine switches do I need?

At minimum, 2 spines for redundancy — a single spine is a single point of failure. For a non-blocking fabric, you need as many spines as uplink ports on each leaf. For a 3:1 oversubscription ratio with a 48×25G + 8×100G leaf, using 4 of the 8 uplinks gives you 4 spines at 3:1. The practical sweet spot for most enterprise fabrics is 4–8 spines, providing 400G–800G of aggregate uplink bandwidth per leaf while maintaining reasonable hardware cost.

Q: What is the difference between a border leaf and a regular leaf?

A border leaf (also called an external leaf or exit leaf) is a leaf switch that connects the VXLAN/EVPN fabric to external networks — the WAN router, internet edge firewall, or an upstream network. It runs the same leaf hardware and protocols but additionally peers with external BGP neighbours and redistributes external prefixes into the fabric. In Cisco ACI, the equivalent is the Border Leaf node with an L3Out configuration. Border leaves are always deployed in pairs for redundancy.

Q: Is STP completely eliminated in a leaf-spine design?

Between the leaf and spine tiers — yes. All leaf-to-spine links are Layer 3 routed interfaces; STP has no role there. However, STP (typically Rapid PVST+ or MST) still runs on the server-facing access ports of each leaf switch, within each VLAN or VNI. For server-facing ports, STP protections like PortFast, BPDU Guard, and Root Guard should always be enabled. The dual-homed server case (a server connecting to two leaves) is handled by EVPN multihoming (ESI-LAG) rather than STP.

📚 Related Articles on The Network DNA

Share this article:

📘 Facebook 🐦 Twitter 💼 LinkedIn

Technical content based on Cisco ACI and NX-OS design guides, Arista EOS VXLAN/EVPN configuration guides, RFC 7938 (BGP for Large-Scale Data Centers), RFC 7432 (BGP EVPN), and Charles Clos's original 1953 switching network paper. CLI examples are representative of Cisco NX-OS 10.x and should be validated in a non-production environment. All content current as of March 2026.

Leaf-Spine Architecture Explained for Modern Data Centers

1. Why Three-Tier Architecture Failed the Modern Data Center

2. The Clos Network: Mathematical Foundation