Services High Availability in VMware NSX-T

Today we are going to discuss about the high availability services in the VMware NSX-T environment. The high availability includes Active/Active mode and Active/Standby mode.

NSX Edge nodes run in an Edge cluster, hosting centralized services and providing connectivity to the physical infrastructure. Since the services are run on the SR component of a Tier-0 or Tier-1 gateway, the following concept is relevant to SR. This SR service runs on an Edge node and has two modes of operations:

  • active/active
  • active/standby

Active/Active mode
This is a high availability mode where SRs hosted on Edge nodes act as active forwarders. Stateless services such as layer 3 forwarding are IP based, so it does not matter which Edge node receives and forwards the traffic. 

All the SRs configured in active/active configuration mode are active forwarders. This high availability mode is only available on Tier-0 gateway.

Stateful services typically require tracking of connection state (e.g., sequence number check, connection state), thus traffic for a given session needs to go through the same Edge node. 

As of NSX-T 2.5, active/active HA mode does not support stateful services such as Gateway Firewall or stateful NAT. Stateless services, including reflexive NAT and stateless firewall, can leverage the active/active HA model. 

Fig 1.1- VMware NSX-T Active/Active mode

Active/Standby mode
This is a high availability mode where only one SR act as an active forwarder. This mode is required when stateful services are enabled. 

Services like NAT are in constant state of sync between active and standby SRs on the Edge nodes. This mode is supported on both Tier-1 and Tier-0 SRs. Preemptive and Non-Preemptive modes are available for both Tier-0 and Tier-0 SR. 

Default mode for gateways configured in active/standby high availability configuration is non-preemptive. A user needs to select the preferred member (Edge node) when a gateway is configured in active/standby preemptive mode. 

When enabled, preemptive behavior allows a SR to resume active role on preferred edge node as soon as it recovers from a failure. 

Tier-0 and Tier-1 Gateway
For Tier-0 Gateway, active/standby SRs have different IP addresses northbound and have eBGP sessions established on both links. 

Both of the Tier-0 SRs (active and standby) receive routing updates from physical routers and advertise routes to the physical routers; however, the standby Tier-0 SR prepends its local AS three times in the BGP updates so that traffic from the physical routers prefer the active Tier-0 SR.

Southbound IP addresses on active and standby Tier-0 SRs are the same and the operational state of standby SR southbound interface is down. Since the operational state of southbound Tier-0 SR interface is down, the Tier-0 DR does not send any traffic to the standby SR. 

For Tier-1 Gateway, active/standby SRs have the same IP addresses northbound. Only the active SR will reply to ARP requests, while the standby SR interfaces operational state is set as down so that they will automatically drop packets.