There is still work to be done on NSX, but I got a number of inquiries asking about how I have the lab server setup from a networking perspective so I thought it would be useful to have a brief intermission and take a look. Let’s start with a picture:
From the hardware perspective, here is how it breaks down:
- Core Switch: Netgear GSM7224V2 – this is a fully managed layer 3 switch with 802.1Q VLAN support, 24 1Gb/s ports, 2 SFP+ modules, LAG/LACP and obviously routing
- Physical Host: Dell T620 – the beast is setup with 192GB ECC LVDDR3-1333 DRAM, 2 x Intel® Xeon® E5-2650L v2 (1.7GHz/10–core/25MB/8.0GT-s QPI/70W), 2 x 750W PSU, 8 x 2TB Western Digital Red NAS drives, PERC H710 RAID controller with 512MB cache, iDRAC Enterprise ILO board, 2 x 120GB Intel SSD, 4 port Intel i350 1Gb/s NIC, 2 port Broadcomm BCM57810 10Gb/s NIC
- Firewall: these days I actually run a dedicated physical firewall in the shape of the (now defunct) McAfee UTM SG720. It’s no ASA, but it’s actually surprisingly powerful and capable for perimeter defense in a home lab.
Of course hardware porn aside, from a networking perspective, the key statistics above are the 6 1Gb/s ports (no 10Gb/s in the lab unfortunately, so the Broadcomm gets to be bored doing 1Gb/s duty).
In terms of logical configuration, I have allocated the NIC’s to 5 discrete virtual standard switches:
- vSwitch0: This is the primary VSS and has been allocated two physical ports. It hosts the following port groups:
- VM Network: the attach point for any VMs running on the physical host – 192.168.2.0 (VLAN 200)
- Management Network (vmkernel): primary management network used for management traffic and VM FT – 192.168.5.0 (VLAN 500)
- vSwitch4: This VSS is dedicated to storage and has one vmkernel attach. Storage and VMotion traffic traverse this link – 192.168.2.0 (VLAN 200) Note that it shares the VM network. My two NAS devices each only have 2 gigabit ports and connect directly to both my client network (192.168.1.0) and the lab (192.168.2.0). They also need to be accessed by the guest VMs constantly. Rather than put a routing boundary in the middle, I opted to just flatten out storage access to VLAN200
- vSwitch1: VSS1 is dedicated to the first nested ESX environment. This environment contains 3 vESXi guests which live in the same vCenter as the physical host (vCenter 1)
- vSwitch2: VSS2 is dedicated to the second nested ESX environment. This environment contains 3 vESXi guests which live in their own vCenter (vCenter 2). SRM is up and running between the two vCenters
- vSwitch3: VSS3 serves as a DMZ as well as the provider network (ext network) for vCD and NSX. It is 192.168.99 (VLAN990) and uplinks to a firewall managed DMZ
In terms of VMware advanced networking (vDS, vCD, vCNS, NSX), I limit this to the nested environments. It makes configuration changes (including full teardown) super easy even if the entire network traffic flow picture gets (pretty damn) confusing. Some things to remember about doing this:
- Enable promiscuous mode on the vSwitch the nested ESX guests attach to
- Allow forged retransmits on the same vSwitch
- In the guest properties be sure to select ESX as the actual OS and enable chipset virtualization passthrough support
The reason for this is that normal vSwitch behavior is to assume that a guest is only responsible for itself (meaning traffic destined for the guest OS is actually destined for applications on the guest OS). In the case where the guest is actually a nested ESX host, the traffic is originating from its guests which have their own vNICs and MAC addresses. Any traffic inbound to the netsted ESX guest is actually headed for an application in one of its guests. As such the vSwitch sees lots of what appear to be alien MAC addresses heading for the nested ESX guest that it will want to drop. These settings prevent that from happening and unlock hypervisor on hypervisor potential.