Cisco ACI Design Interview Questions: What Architects Are Really Asked
Cisco ACI interviews at the architect level are not about remembering that the APIC runs on a cluster of three nodes or that BD flooding is disabled by default in optimized mode. Interviewers at that level already assume you know the product. What they are testing is whether you can design with it — whether you understand the trade-offs, the failure domains, the policy model edge cases, and the migration complexity that comes with running ACI in a live production data center.
This guide covers the design-focused questions most frequently asked in senior network engineer and architect interviews, organized by topic area. Each question is answered with the depth and reasoning expected at that level — not bullet lists of features, but actual architectural thinking.
Fabric Architecture & Underlay Design
Question 01
You are designing a greenfield ACI fabric for a financial services data center with 2,000 servers. Walk me through your spine-leaf topology decisions and the key constraints that drive them.
What the interviewer wants: Understanding of ACI's Clos topology constraints, spine/leaf roles, and the design limits that affect scale.
The first constraint is scale per leaf: each ACI leaf has a finite downlink port count and a local endpoint table. At 2,000 servers you will likely need 40–60 leaf switches (accounting for dual-homing via vPC). Spine count is driven by oversubscription tolerance and east-west bandwidth requirements — for financial workloads I start with a minimum of four spines for redundancy and bandwidth. The critical ACI-specific constraint is that spines are purely transit — no servers attach to spines, no L3-out on spines (in a standard design). The APIC cluster connects to leaf switches, not spines. Border leaf nodes for external connectivity should be dedicated pairs, not shared with server-facing leaf switches, to isolate the failure domain of external peering from the internal fabric forwarding.
Question 02
How does ACI's IS-IS underlay differ from a traditional routed data center underlay, and what are the operational implications of not having direct access to it?
ACI runs a private IS-IS instance across the fabric for underlay reachability between TEPs (Tunnel Endpoint addresses). Unlike a standard IS-IS deployment, this instance is managed entirely by APIC and is not directly configurable by the operator — you cannot redistribute external routes into it or modify timers through the CLI in the same way. The operational implication is that traditional underlay troubleshooting tools do not apply: you do not check show isis neighbors the same way. Instead, you use show endpoint, acidiag fnvread, and APIC fault analysis. Engineers who come from traditional environments often spend time debugging the overlay (policy, contracts) when the real issue is a fabric link or optics failure at the underlay — something only visible through ACI-specific health score tooling.
Tenant, VRF & EPG Policy Model Design
Question 03
A customer wants to map their existing three-tier application (web, app, DB) into ACI. Describe two different tenant/EPG design approaches and the trade-offs of each.
Approach 1 — One EPG per tier, contracts between them. Web EPG provides to App EPG; App EPG provides to DB EPG. This gives granular microsegmentation and a clean security policy boundary. The trade-off is contract management complexity — as application tiers grow, the contract filter matrix grows quadratically and becomes difficult to audit.
Approach 2 — Network-centric EPGs (one EPG per subnet/VLAN). This mirrors the existing VLAN model, which simplifies migration. The trade-off is that you lose the workload-identity-based security model that makes ACI's microsegmentation valuable — EPG membership is still subnet-based rather than endpoint-attribute-based. Most organizations start with Approach 2 to reduce migration risk, then progressively refactor toward Approach 1 as they become more comfortable with the policy model. A hybrid approach using uSeg EPGs (micro-segmentation endpoint groups) allows attribute-based refinement within a subnet-based structure — often the pragmatic production answer.
Question 04
When would you use multiple VRFs within a single tenant versus multiple tenants? What are the policy and operational implications of each?
Multiple VRFs within a single tenant are appropriate when you need routing isolation between application environments (prod/dev/test) but want to share common network services — L3-outs, service graphs, contracts to shared services — through the same administrative domain. A single tenant gives you a unified RBAC boundary and simplifies cross-VRF shared service consumption via the vzAny or leaked routes pattern. Multiple tenants are appropriate when you need hard administrative isolation: separate RBAC delegations, independent fault domains, or true multi-tenancy for different business units or customers. The key operational implication is that contracts cannot span tenants natively — cross-tenant communication requires a shared service tenant architecture with exported contracts and imported bridge domains, which adds policy model complexity. Choose tenants as the blast-radius boundary for access control delegation, not as a substitute for VRF isolation.
⚠ Common Interview Trap: Candidates often conflate VRF isolation with tenant isolation. A VRF provides routing separation; a tenant provides administrative and policy separation. You can have multiple VRFs in one tenant, and that is often the right answer for single-organization deployments.
Bridge Domain, L3-Out & External Connectivity
Question 05
Explain the relationship between a Bridge Domain and an EPG in ACI. Why is it wrong to assume a 1:1 mapping between them, and when would you use a many-EPG-to-one-BD design?
A Bridge Domain (BD) is the Layer 2 flooding and Layer 3 gateway construct — it defines the subnet, ARP/flooding behavior, and the anycast gateway IP. An EPG is the policy construct — it defines which endpoints belong to a group and what contracts govern their communication. They are independent objects: multiple EPGs can be associated to the same BD, meaning endpoints in different EPGs share the same subnet and L3 gateway but have different security policies applied to their traffic. The many-EPG-to-one-BD design is the correct approach for microsegmentation within a subnet — for example, separating web servers and management jump hosts that share a /24 into different EPGs with different contract access, without requiring separate IP subnets. This is a key ACI design differentiator versus traditional VLAN-based segmentation, which requires a new VLAN and subnet for each security boundary.
Question 06
A customer has an existing border router running BGP to the upstream WAN. How would you design the L3-Out in ACI, and what are the failure domain considerations for the border leaf placement?
The L3-Out is defined under the tenant VRF and terminates on a pair of dedicated border leaf switches. The border leaf pair should be physically separate from server-facing leaves to contain the failure domain — a BGP session instability or route leak on a border leaf should not affect workload forwarding on server leaves. The L3-Out node profile assigns logical interface profiles to the border leaf interfaces connecting to the upstream router. BGP is configured under the L3-Out with route-maps equivalent — in ACI terms, import/export route control policies built from route profiles and route maps within the APIC GUI. A critical design decision is whether to use transit routing (ACI fabric advertises workload subnets externally via the L3-Out) or summary/default from the WAN into ACI. For large fabrics, summarizing at the border and injecting a default route into each VRF is operationally simpler and reduces the BGP RIB size on the WAN routers significantly.
Multi-Pod, Multi-Site & Stretched Fabric Design
Question 07
A customer has two data centers 80 km apart and needs active-active workload distribution with L2 adjacency between sites. When do you recommend Multi-Pod versus Multi-Site, and what drives that decision?
Multi-Pod extends a single ACI fabric across multiple physical pods connected by an Inter-Pod Network (IPN). It uses a single APIC cluster and a single policy domain. The critical constraint is latency: Cisco recommends under 50 ms RTT between pods, making it suitable for metro distances. Multi-Pod provides a single control plane — one APIC manages all pods — which simplifies policy consistency but creates a shared failure domain for the APIC cluster.
Multi-Site uses Nexus Dashboard Orchestrator (NDO) to manage multiple independent ACI fabrics, each with its own APIC cluster. Policy is stretched across sites via a VXLAN/MP-BGP EVPN control plane over the inter-site network. Multi-Site is the correct answer when sites need independent failure domains — a full APIC cluster outage at Site A should not affect forwarding at Site B. At 80 km with active-active L2 requirements, Multi-Pod is technically viable if latency is under 50 ms. However, for business-critical active-active designs I generally recommend Multi-Site with stretched BDs, accepting the higher design complexity in exchange for true fault isolation between sites.
| Dimension | Multi-Pod | Multi-Site |
|---|---|---|
| APIC Cluster | Single shared cluster | Independent cluster per site |
| Failure Domain | Shared — APIC outage affects all pods | Isolated per site |
| Latency Requirement | < 50 ms RTT (IPN) | < 150 ms RTT (ISN) |
| L2 Stretch | Native (same fabric) | Via stretched BD + EVPN |
| Management Tool | APIC only | Nexus Dashboard Orchestrator |
L4–L7 Services & Service Graph Integration
Question 08
How does ACI Service Graph work, and what is the design difference between Go-To and Go-Through service function insertion modes?
A Service Graph is the ACI construct for policy-driven traffic steering through Layer 4–7 devices (firewalls, load balancers) inserted between EPGs. When a contract between a consumer EPG and a provider EPG has a service graph attached, the fabric redirects matching traffic through the defined service chain rather than switching it directly. Go-To mode (also called one-arm or routed mode) sends traffic to the service device as a next-hop — the device sees traffic from only one direction per interface. This works for stateless inspection or when the service device has its own routing. Go-Through mode (transparent/bridge mode) steers traffic through the service device inline — it sees both directions of the flow. Go-Through is required for stateful firewalls that need to track full session state for both directions of the flow, which is the typical requirement for firewall service insertion in a production ACI fabric. The critical design implication is that Go-Through requires the service device to be configured in transparent bridge mode, which constrains where it can be placed relative to VRF boundaries.
⚠ Follow-up question often asked: "What happens to traffic flow if the APIC cluster goes down after a service graph is deployed?" Answer: APIC is the policy plane only. The forwarding plane (hardware programming on leaf switches) continues to function based on the last programmed state. Traffic already flowing through the service graph continues. New policy changes cannot be pushed until APIC recovers — this is a key resiliency property candidates must articulate clearly.
Migration Strategy & Brownfield Design
Question 09
A customer is migrating 200 VLANs from a traditional Nexus 5K/7K fabric to ACI. What migration strategy would you recommend, and how do you handle the contract enforcement risk?
The standard approach for brownfield migrations is a phased VLAN-by-VLAN lift-and-shift using ACI's network-centric mode. Each VLAN is mapped to an EPG with contracts initially set to preferred group membership — which means all EPGs in the preferred group communicate freely without explicit contracts. This preserves existing any-to-any reachability from the legacy fabric, giving the team time to document actual traffic flows before enforcing microsegmentation.
The migration sequence for each VLAN involves: deploy the BD and EPG in ACI, extend the VLAN to the ACI leaf via a static port binding, migrate servers one rack at a time, verify forwarding, then decommission the Nexus uplink for that VLAN. The contract enforcement phase happens post-migration, using NetFlow or ACI's own endpoint analytics to baseline communication patterns before writing contracts. Never enable contract enforcement simultaneously with the initial migration — this is the most common cause of production outages in ACI adoption projects.
✔ Strong Candidate Signal: Mentioning the use of Atomic Counter and Health Score monitoring per EPG during migration phases demonstrates hands-on operational maturity — not just theoretical design knowledge. Interviewers at architect level notice this.
Quick-Reference: Key Design Principles to State in Any ACI Interview
| Design Principle | Why It Matters in Production |
|---|---|
| Dedicated border leaf pairs | Isolates external routing failure domain from internal fabric forwarding |
| Preferred groups for migration | Preserves legacy any-to-any reachability while building the policy model safely |
| vzAny for shared services | Scales common service access (DNS, NTP, monitoring) without per-EPG contracts |
| APIC is policy plane only | Forwarding continues independent of APIC availability — critical resiliency point |
| uSeg EPGs for microsegmentation | Enables attribute-based policy within a subnet without IP/VLAN redesign |
| NDO for multi-site consistency | Single pane for policy templates across independent fabrics — prevents config drift |
Approaching the ACI Design Interview
The questions above share a common thread: they test whether you reason in trade-offs, not features. Every ACI design decision involves a tension — operational simplicity versus microsegmentation granularity, migration speed versus security posture, shared infrastructure versus blast-radius isolation. Interviewers at the architect level are listening for that trade-off language.
When answering, lead with the constraint that drives the decision, explain the alternatives you considered, and state clearly what you would sacrifice and why. That reasoning process — more than any specific command or feature — is what distinguishes an architect from an operator in a Cisco ACI design interview.
Cisco ACI feature sets and design recommendations evolve across software releases. Always validate design decisions against the current Cisco Validated Design guides and APIC release notes for your target software version.