VXLAN in Cisco ACI
☰ Table of Contents
1. Why VXLAN? The Problem ACI Was Built to Solve
Traditional data center networks suffered from two structural limitations that became painful at scale. First, spanning-tree protocol blocked half the available links to prevent forwarding loops, leaving expensive bandwidth sitting idle. Second, the 802.1Q VLAN tag field is only 12 bits wide — which caps a network at 4,094 VLANs. For a multi-tenant data center running thousands of isolated application environments, that ceiling is hit quickly.
Cisco ACI's answer is to treat every packet in the fabric as a VXLAN packet — full stop. The moment a frame enters the fabric at an ingress leaf switch, it gets wrapped in a VXLAN header regardless of whether it arrived as a plain Ethernet frame, a 802.1Q-tagged frame, or even an NVGRE packet. This normalization means the fabric itself speaks one universal language internally, while still connecting seamlessly to any external encapsulation on the edges.
⚠ Figure 1 — Traditional Network Problems vs ACI VXLAN Solution
|
Traditional Network — Pain Points STP Blocked Ports: 50% of links idle to prevent loops. No true multipath forwarding. 4,094 VLAN Limit: 12-bit 802.1Q tag cannot scale for large multi-tenant environments. L2/L3 Boundary Constraints: Endpoints locked to subnets. Moving VMs across racks breaks IP. No Policy Carry: Bare Ethernet carries no information about which security policy applies. |
→ ACI VXLAN |
✅ ACI VXLAN — What Changes Full Mesh, No STP: All links active via ECMP over IP underlay. Loop-free by design. 16 Million Segments: 24-bit VNID field in VXLAN header gives 16,777,215 unique segments. Flexible Endpoint Placement: Endpoints move anywhere while keeping their IP. Ingress leaf does routing. Policy in Every Packet: ACI embeds policy class tags (pcTag/sclass) inside the VXLAN header for distributed enforcement. |
2. VXLAN Basics — RFC 7348 Fundamentals
VXLAN, defined in RFC 7348, is a MAC-in-UDP encapsulation scheme. It wraps an original Layer 2 Ethernet frame inside a UDP packet, which rides over a standard IP network. The genius of this approach is that the IP network in the middle — the underlay — has no awareness of the overlay topology. It simply routes UDP packets from one IP address to another based on the outer headers.
The device that performs the encapsulation and decapsulation is called a VTEP (VXLAN Tunnel Endpoint). In standard VXLAN deployments, VTEPs can be physical switches, hypervisors, or any device with an IP address in the underlay. In ACI, every leaf switch is a VTEP — but ACI adds several proprietary extensions to the standard VXLAN header that make the fabric do things a generic VXLAN implementation cannot.
Figure 2 — VXLAN Encapsulation Frame Structure
|
Outer Ethernet Header Dst MAC: Next-hop router | Src MAC: Leaf uplink |
14 bytes |
L2 Underlay
|
|
Outer IP Header Src IP: Ingress VTEP (PTEP) | Dst IP: Egress VTEP (PTEP) |
20 bytes |
L3 Underlay
|
|
Outer UDP Header Dst Port: 4789 (IANA) | Src Port: Flow entropy hash |
8 bytes |
ECMP Entropy
|
|
VXLAN Header (ACI-Extended) 24-bit VNID + Flags (incl. ACI Policy Bit) + Reserved |
8 bytes |
ACI Overlay
|
|
Inner Ethernet Header Original Src/Dst MAC of the communicating endpoints |
14 bytes |
Inner L2
|
|
Inner IP + Payload Original IP packet (TCP/UDP/etc.) — application data |
Variable |
App Data
|
Total VXLAN Overhead = 50 bytes (14 outer ETH + 20 outer IP + 8 UDP + 8 VXLAN header). ACI adds 54 bytes if the 802.1Q tag from the original frame is preserved. Configure MTU ≥ 1600 bytes on all fabric links to avoid fragmentation.
The UDP source port is deliberately varied based on a hash of the inner frame's flow fields (src IP, dst IP, protocol, src port, dst port). This per-flow entropy means that different traffic flows take different ECMP paths through the spine — giving you true load balancing without any per-packet reordering issues.
3. ACI Spine-Leaf and the Underlay
ACI mandates a spine-leaf topology. Every leaf connects to every spine, and no two leaves connect directly to each other. This creates a two-hop maximum path between any two endpoints in the fabric — traffic always traverses at most one spine switch. The regularity of this topology is what makes ACI's VXLAN forwarding model predictable and scalable.
The underlay IP network between leaf and spine uses IS-IS (Intermediate System to Intermediate System) to distribute reachability to each VTEP's loopback address. This is a critical distinction from traditional data center IS-IS — in ACI, IS-IS runs only on the point-to-point links between leaf and spine switches. It never touches servers or external routers. All it does is make sure every leaf knows how to reach the loopback IP address (PTEP) of every other leaf through the spine switches.
⚒ Figure 3 — ACI Spine-Leaf VXLAN Topology
|
Spine Layer IS-IS routes Proxy-TEP COOP DB |
Spine-1 Proxy-TEP: 10.0.0.128 IS-IS · COOP Oracle · anycast |
Spine-2 Proxy-TEP: 10.0.0.128 IS-IS · COOP Oracle · anycast |
|
Leaf Layer PTEP (VTEP) Policy Enforcement |
Leaf-1 PTEP: 10.0.64.64 BD/VRF VNID assignment |
Leaf-2 PTEP: 10.0.96.64 Encap/decap VXLAN |
Leaf-3 PTEP: 10.0.128.64 Default GW for endpoints |
|
Server C 10.10.2.10 (different subnet) |
Server D 10.10.1.12 (same subnet as A/B) |
All leaf-to-leaf traffic travels via exactly one spine switch — maximum two hops. Spines forward on outer VXLAN IP only.
4. TEP Types — PTEP, Proxy-TEP, FTEP, and vPC-TEP
One of the first things that trips up engineers new to ACI is the proliferation of TEP types. In standard VXLAN, a VTEP is simply "the IP address I use to send VXLAN traffic." In ACI, the fabric carves out several distinct TEP roles, each serving a specific forwarding purpose. All of these addresses live inside the Overlay-1 VRF — a dedicated, non-tenant infrastructure VRF that ACI uses exclusively for fabric communication.
Figure 4 — ACI TEP Types in the Overlay-1 VRF
|
PTEP — Physical Tunnel Endpoint
The unique loopback IP address assigned to each individual leaf and spine by APIC from the infrastructure TEP pool (configured at fabric init time, e.g. 10.0.0.0/16). This address is allocated as a /32 loopback on Overlay-1. Used for: Non-vPC data plane, APIC-to-leaf communication, traceroute, MP-BGP peering (for L3Out), ping between fabric nodes. |
Proxy-TEP — Spine Anycast Address
An anycast IP address shared by all spines. When a leaf cannot find the destination endpoint's VTEP in its local mapping table, it sends the VXLAN packet to this anycast address. Any spine can receive it and look up the correct destination in the COOP mapping database. Used for: Unknown unicast forwarding when the leaf doesn't have a local mapping for the destination endpoint. |
|
FTEP — Fabric Loopback TEP
A special anycast address identical on all leaf nodes, used when a VMM domain (VMware vSphere / ESXi) is integrated. The hypervisor hosts its own VTEP (vSwitch VTEP), and the leaf uses the FTEP as the source to encapsulate VXLAN traffic destined for the vSwitch. This lets virtual machine VTEPs "see" a consistent fabric-side address regardless of which leaf they connect through. Used for: VMM domain integration with vSwitch VTEPs. |
vPC-TEP — Virtual Port Channel TEP
When two leaf switches form a vPC pair, they share a virtual IP address called the vPC-TEP (sometimes called VPC VIP or VTEP). Traffic destined to endpoints connected across both vPC members uses this shared address as the tunnel destination, allowing either leaf to receive and forward the VXLAN packet correctly. Used for: Dual-homed server connectivity via vPC for high availability. |
Verify TEP Addresses — CLI
leaf101# acidiag fnvread <!-- shows PTEP of all nodes -->
leaf101# show ip interface vrf overlay-1 <!-- shows all TEP loopbacks -->
leaf101# show interface tunnel <id> <!-- shows VXLAN tunnel state + destination VTEP -->
5. VNID Types — BD, VRF, and EPG
The 24-bit VNID field in the VXLAN header carries 16 million possible values, but ACI doesn't use them all the same way. The fabric assigns VNIDs to three different construct types, and the value in that 24-bit field changes based on what kind of forwarding is happening at any given moment. Getting this right is the key to understanding how ACI actually forwards packets.
Figure 5 — The Three VNID Types in ACI
|
BD VNID Bridge Domain Identifier Assigned to each Bridge Domain. Represents a Layer 2 flooding domain. Used when: ✓ Multicast/BUM traffic is forwarded within the BD ✓ ARP flooding (when proxy ARP is off) ✓ IPv6 with NDP (Neighbor Discovery Protocol) Example: VNID 15761386 = BD "Web-BD" |
VRF VNID (L3 VNID / Private Network ID) Assigned to each VRF (Context). Represents a Layer 3 routing domain. Used when: ✓ Traffic is being routed between subnets within the same VRF ✓ Ingress leaf sends inter-subnet traffic through the fabric ✓ Spine proxy forwarding for unknown unicast unicast lookup One L3 VNID per VRF — shared across all BDs in that VRF |
EPG VNID Endpoint Group VLAN mapping Assigned per EPG. Maps the external VLAN tag (on the access port) to the ACI policy domain. Less commonly discussed but essential at the access edge: ✓ Maps customer VLAN (e.g. VLAN 100) to EPG on a port ✓ Leaf translates the access VLAN to a VNID for fabric forwarding ✓ Infra VLAN 4093 is used between leaf and APIC for fabric management Access VLAN → VNID normalization at leaf ingress |
Key Insight: A single traffic flow might use different VNIDs as it traverses the fabric. An L3 (routed) packet from Web-BD to App-BD uses the VRF VNID while transiting the fabric, but the destination BD VNID at the egress leaf for the final lookup.
6. ACI VXLAN Header — The Policy Bit and sclass
Standard VXLAN has only one meaningful flag defined in RFC 7348 — the "I" flag indicating that the VNID is valid. Cisco ACI repurposes the reserved bits in the VXLAN header to carry additional information that enables its distributed policy model. The two most important additions are the Policy bit and the sclass (source class) field.
The Policy bit (P bit) is set to 1 when the ingress leaf has already applied and instantiated the ACI policy for this packet. When an egress leaf receives a VXLAN packet with P=1, it skips the policy enforcement step — trusting that the ingress already handled it. When P=0, the egress leaf performs the policy lookup. The sclass field carries the PCTag (Policy Class Tag) of the source EPG, which the egress leaf uses to look up the contract rules governing whether this traffic is permitted.
Figure 6 — ACI VXLAN Header Bit Layout
Bits 0–31 (First 32-bit Word)
|
R Rsv |
R Rsv |
P Policy |
R Rsv |
I VNID valid |
Group Policy ID / sclass 16-bit Source EPG Class (PCTag) — ACI extension |
Bits 32–63 (Second 32-bit Word)
|
VNID — 24 bits BD VNID or VRF VNID — identifies the forwarding domain |
Reserved 8 bits |
|
P Bit = 0 Policy has NOT been instantiated on this leaf. The egress leaf MUST perform a contract lookup using the sclass in the header before forwarding. |
P Bit = 1 Policy has already been instantiated at ingress. The egress leaf trusts the ingress decision and forwards without re-evaluating the contract. This is the normal case for intra-fabric traffic. |
7. Underlay Protocols — IS-IS, COOP, and Overlay-1 VRF
ACI's underlay isn't a blank IP network — it runs two protocols that are critical to how VXLAN forwarding actually works. Understanding both IS-IS and COOP, and how they interact with the Overlay-1 VRF, is what separates engineers who truly understand ACI from those who only know the GUI.
Figure 7 — ACI Control Plane Protocol Stack
|
IS-IS — Underlay Reachability Runs on the point-to-point links between leaf and spine switches within the Overlay-1 VRF. Its sole job is to ensure every node in the fabric knows the /32 loopback address (PTEP) of every other node. ▶ Distributes /32 PTEP routes for each leaf and spine ▶ Distributes vPC-TEP (VPC VIP) addresses ▶ Distributes the Proxy-TEP anycast address on all spines ▶ ECMP load balancing across spine uplinks — all paths active Does NOT carry: Tenant routes, endpoint MAC/IP mappings, or any overlay information — only underlay reachability. |
COOP — Endpoint Mapping Database Council of Oracle Protocol. Runs between leaf switches and spine switches over PTEP loopbacks (inside Overlay-1). Each spine acts as a "COOP Oracle" — a distributed mapping database that knows the VTEP location of every endpoint in the fabric. ▶ When a leaf learns a new endpoint (MAC/IP), it registers the mapping with all spine COOP Oracles ▶ Spines replicate the mapping to each other — ensuring all spines have full fabric endpoint visibility ▶ When a leaf needs to forward to an unknown destination, it sends the VXLAN packet to the Proxy-TEP (anycast spine IP) ▶ The spine looks up the real VTEP (PTEP) of the destination and re-encapsulates the packet to that leaf Verify COOP: spine101# show coop internal info repo ep |
Overlay-1 VRF — The Fabric's Own Network
ACI reserves a dedicated VRF called Overlay-1 for all fabric control plane and data plane communication. Tenant VRFs never share the same forwarding table. Overlay-1 contains:
|
/32 routes to every PTEP |
vPC VIP (TEP) addresses |
Spine Proxy-TEP anycast |
APIC management address |
FTEP address (VMM) |
8. L2 Forwarding — Same Leaf and Different Leaf
Layer 2 (switched) traffic in ACI carries the Bridge Domain VNID inside the VXLAN header. The forwarding behavior splits into two scenarios: same-leaf and different-leaf, and each follows a distinct path.
Figure 8 — L2 Traffic Forwarding Scenarios
|
Scenario A — Same Leaf 1
Server A sends Ethernet frame to Server B. Both are connected to the same Leaf-1. 2
Leaf-1 checks its local endpoint table. Server B's MAC is locally known — no VXLAN needed. 3
Policy lookup: leaf checks that Server A's EPG is permitted to communicate with Server B's EPG via contract. 4
Frame forwarded locally to Server B's port. Traffic never left Leaf-1. No spine involved. ✅ Zero fabric hops — most efficient L2 path possible |
Scenario B — Different Leaf (Known Endpoint) 1
Server A (Leaf-1) sends frame to Server D (Leaf-3, same BD). 2
Leaf-1 checks local table — knows Server D's VTEP is 10.0.128.64 (Leaf-3 PTEP). Uses BD VNID in VXLAN header. 3
Leaf-1 encapsulates: Outer Dst IP = 10.0.128.64, VNID = BD's 24-bit ID, P bit = 1 (policy applied at ingress). 4
Spine receives packet, routes on outer IP header only — forwards to Leaf-3's PTEP via IS-IS routes. 5
Leaf-3 decapsulates VXLAN, delivers original frame to Server D on its local port. Two fabric hops max: Leaf-1 → Spine → Leaf-3 |
Unknown Unicast (Destination Unknown): If Leaf-1 doesn't know Server D's VTEP, it sends the VXLAN packet to the Proxy-TEP anycast address on the spine. The spine queries COOP, finds the correct PTEP for Server D (Leaf-3), strips the outer IP header and re-encapsulates with the correct destination. This spine-proxy lookup happens only for the first packet — subsequent traffic flows directly leaf-to-leaf.
9. L3 Forwarding — Inter-Subnet Routing
One of ACI's most elegant architectural decisions is where it places the default gateway for tenant subnets. Rather than centralizing routing at a pair of core switches, ACI distributes the default gateway function to every leaf switch simultaneously. The same subnet gateway IP and MAC address exist on every leaf in the fabric — APIC programs them identically across all assigned leaves.
This means that when Server A (10.10.1.10) wants to send traffic to Server C (10.10.2.10 in a different subnet), it sends the packet to its default gateway — and that gateway is physically present on Leaf-1 right next to Server A. There is no round trip to a core router. The routing happens at ingress, and then the traffic is sent across the fabric in the VRF VNID (not the BD VNID) to reach the egress leaf where it gets re-encapsulated in the destination BD VNID.
Figure 9 — L3 Inter-Subnet Packet Walk (A → C)
| Step | Location | Action | VXLAN VNID |
| 1 | Server A | Sends packet: Dst IP = 10.10.2.10, Dst MAC = default gateway MAC (shared across all leaves) | None (access port) |
| 2 | Leaf-1 (Ingress) | Default gateway catches packet. Routes at L3. Looks up 10.10.2.10 in VRF — finds VTEP = Leaf-2 (10.0.96.64). Encapsulates with VRF VNID. Sets P=1, sclass = Source EPG PCTag. | VRF VNID |
| 3 | Spine Switch | Spine sees only the outer IP header. Routes to Leaf-2's PTEP (10.0.96.64) via IS-IS. Does NOT open the VXLAN packet. | Transparent |
| 4 | Leaf-2 (Egress) | Decapsulates VXLAN. VRF VNID tells it this is a routed packet for App-BD. Looks up Server C's MAC in App-BD local table. Re-encapsulates with destination BD VNID if needed — or delivers directly to Server C's port. | BD VNID (egress) |
| 5 | Server C | Receives packet. Sees original IP packet — completely unaware it traversed a VXLAN fabric. Source IP = 10.10.1.10, from Server A. | None (access port) |
Distributed Gateway Advantage: Routing happens at ingress — never at a centralized core. This eliminates the "tromboning" problem where traffic had to travel to a central router before reaching its destination, even when source and destination were physically adjacent.
10. Multidestination Traffic — GIPo Multicast and Head-End Replication
Not all traffic in a data center is unicast. ARP broadcasts, multicast streams, and unknown unicast flooding all need to reach multiple destinations simultaneously. In ACI, this is handled using a concept called GIPo (Group IP Outer) — a multicast IP address assigned to each Bridge Domain that serves as the destination for all BUM (Broadcast, Unknown unicast, Multicast) traffic within that BD.
When a leaf needs to flood traffic for a given BD, it encapsulates the frame in VXLAN and sends it to that BD's GIPo multicast address. Every leaf that has endpoints in that Bridge Domain is subscribed to the GIPo group, so they all receive the flooded packet through the underlay multicast tree. Spines act as multicast rendezvous points (or forward the multicast traffic) — they do not need to know anything about the overlay content, just the outer multicast IP address.
Figure 10 — GIPo Multicast BUM Traffic Forwarding
Leaf-1 (Source)
Server A sends ARP broadcast
Encapsulates in VXLAN → Outer Dst = GIPo: 225.1.45.64
↓
VXLAN UDP to GIPo multicast address
|
Spine-1 Multicast RP — replicates to all subscribers |
Spine-2 Forwards to subscribed leaves |
↓
Delivered to all leaves subscribed to GIPo 225.1.45.64
|
Leaf-1 (Self) Receives own copy — discards (loop prevention) |
Leaf-2 ✅ Has endpoints in this BD — decapsulates and delivers to Server C's port |
Leaf-3 ✅ Has endpoints in this BD — decapsulates and delivers to Server D's port |
Leaf-4 ✗ No endpoints in this BD — not subscribed to GIPo, never receives the packet |
GIPo Assignment: Each Bridge Domain receives a unique multicast IP address (GIPo) from a configured multicast range — typically something like 225.x.x.x. APIC assigns these automatically. When a leaf has no endpoint in a BD, it never joins that GIPo group and never wastes fabric resources processing floods for that BD.
Multicast-Free Option — Head-End Replication: When an external multicast underlay is not available (such as in ACI Multi-Site across DCI links), ACI can use head-end replication instead of GIPo. The ingress leaf sends a separate unicast VXLAN copy to each remote leaf that has endpoints in the BD. More CPU-intensive but works over any IP underlay without multicast support.
Essential VXLAN in ACI — CLI Quick Reference
leaf# show endpoint mac <mac-addr> <!-- find endpoint location, VTEP IP, VNID -->
leaf# show endpoint ip <ip-addr> <!-- find endpoint by IP -->
leaf# show vxlan <!-- summary of VXLAN config and tunnel interfaces -->
leaf# show interface tunnel <id> <!-- tunnel src/dst VTEP, encap stats -->
leaf# show ip route vrf overlay-1 <!-- underlay IS-IS routes to all PTEPs -->
leaf# acidiag fnvread <!-- PTEP/node ID mapping for all fabric nodes -->
spine# show coop internal info repo ep <!-- full endpoint database on spine COOP Oracle -->
✅ Key Takeaways — VXLAN in Cisco ACI
| ▶ Every packet in the ACI fabric is normalized to VXLAN at ingress — external VLAN, NVGRE, or untagged frames all become VXLAN inside the fabric. |
| ▶ ACI uses four distinct TEP types — PTEP (per-node), Proxy-TEP (spine anycast), FTEP (VMM), and vPC-TEP (dual-homed) — each playing a specific role in packet delivery. |
| ▶ The VNID field carries either a BD VNID (L2 forwarding) or a VRF VNID (L3 routing) depending on the packet type and forwarding path. |
| ▶ IS-IS builds underlay reachability (PTEP /32 routes). COOP maintains the endpoint mapping database on spine oracles. Together they enable spine-proxy forwarding for unknown unicast. |
| ▶ The Policy Bit and sclass in the ACI-extended VXLAN header carry security policy context in every packet, enabling distributed contract enforcement at any leaf in the fabric. |
| ▶ BUM traffic uses GIPo multicast addresses per Bridge Domain — only leaves with active endpoints in a BD receive flooded traffic, preventing fabric-wide broadcast storms at scale. |
Tags