For those just joining, in the first 3 entries we introduced NSX overlay networking to a vCenter environment, performed all of the required base configuration, and created our first upper layer network service appliance, the NSX Edge.  To review, the NSX Edge is a lot like the old vShield Edge in that it is a firewall appliance with NAT, firewall, load balancing, VPN termination, but very much unlike the original in that it is a massively capable router as well.  In addition every one of those capability areas has been dramatically expanded since the vShield days.  In short it is a  full features virtual perimeter appliance that can stretch up to layer 7 as well as handle internal  routing between (virtual) subnets (VXLAN vwires or, in NSX parlance, layer 2 domains on the logical switch) as well.

One of the most exciting things about a true overlay network like NSX, and why the Nicira acquisition was such a smart one by VMware, is that the virtual network constructs don’t just stop at the perimeter.  It’s a really powerful thing having this truly virtual, software based router available as a click-to-deploy console.  Some of the limits of the Edge, though, are that it is limited to 10 interfaces (just like vShield) and it also has a lot going on that is all perimeter based.  What if you just want a pure router?  Well the fantastic news here is NSX has you covered.  The logical routing appliance is a subset of the NSX Edge, focused just on routing, that also adds the capability to be a bridge.  This opens a huge range of possibilities. Before we get into those possibilities though, let’s get one up and running!

As always, head to the web client Networking & Security plugin.  Select NSX Edges and click the green “+” to add.  Last time we deployed an Edge Services Gateway, this time we’re doing a “Logical (Distributed) Router”.  Once again we give it both a name (for the VM appliance) and hostname (for the actual OS) and have the option of enabling High Availability:

Screenshot 2014-09-18 15.29.32

Next we enter a password.  Once again, extremely strong password rules are enforced:

Screenshot 2014-09-18 15.29.56

So far the wizard is identical  Select a datastore to which the logical router appliance will be deployed.  In my case once again sending it to the vSAN:

Screenshot 2014-09-18 15.30.39

Next up is the interface configuration.  Finally we diverge a bit from the NSX Edge wizard.  For the logical router we need to configure a dedicated management interface (management plane) in addition to adding actual routing interfaces (control and data planes).  For the management interface I’ve created a port group on the vDS which shares a VLAN with my physical hosts vSS since this is where all of my base VMs reside (vCenter, vCS, etc).  After connecting the management interface to a port group add a valid IP:

Screenshot 2014-09-18 15.37.06

 

With the management interface configured, we’re back to business as usual.  Here we are adding the routing interfaces exactly the same as we did on the Edge.  Name them and classify them as either internal or uplink, then connect them to an appropriate port group.  It is important to note that interfaces on the logical router require VLAN tagging in order to connect.  That is worth repeating.  If you create an interface on the logical router, and connect it to a port group which is set to VLAN0, it will allow you to do this and deployment will fail.  Not quite sure why this is, or why it isn’t a pre-requisite on the Edge, but it is something to be aware of for the logical router.  Once again, add IPs for any created interfaces and adjust the MTU as needed:

Screenshot 2014-09-18 15.36.09

If HA was selected we configure it once interface configuration is complete:


Screenshot 2014-09-18 15.37.12

One final check before deploying to verify that everything is correct:

Screenshot 2014-09-18 15.37.16

And viola!  As long as resources are available to host the appliance, and VLAN tagging has been set in the connected port groups, the logical router will quickly deploy:


Screenshot 2014-09-19 20.12.47

 

Our router has been deployed, but we haven’t done any configuration.  Double click on the router name to switch to the logical router configuration pages and be prepared for a treat.  This is a seriously powerful virtual network element!  On the manage tab we are confronted with a whole boatload of options.  The first section, “Settings”, gives us a Configuration overview first.  Here we can change the management interface config, as well as the HA parameters.  We can also setup syslog anddownload logs for tech support troubleshooting.  We can also deploy additional logical routers in the bottom pane:

Screenshot 2014-09-19 20.49.03

Under interfaces we can see the interfaces created during initial deployment.  We can also add more using the same UI.  A staggering 999 interfaces, with 8 of them being uplinks, can be created!  Of course bandwidth of the underlying host should be scaled appropriately for the uplinks:

Screenshot 2014-09-19 20.49.13

The Firewall section provides a really easy to use UI for creating ingress and egress filters.  Very intuitive: name, source, destination, service, permit/deny:

Screenshot 2014-09-19 20.49.34

The Routing section is where things get really interesting and the true power of the logical router is unlocked. It starts off with top level configuration.  Set a default gateway for the router itself, if appropriate, and then enable the dynamic routing options. OSPF and even BGP(!) are supported.  This is fantastic as, with these two protocols, 80% of both internal and external integration cases are covered.  We can also configure logging in this section:

Screenshot 2014-09-19 20.49.53

In the event that the logical router is being deployed into an environment without dynamic routing, static routes can be created.  Once again intuitive, Interface, network, next hop, MTU (powerful – per route MTU, this is fantastic), and of course a description field:

Screenshot 2014-09-19 20.50.08

The static route dialogue is straight to the point:

Screenshot 2014-09-19 20.50.14

The OSPF tab is a bit overwhelming for anyone not familiar with the protocol, but will look like home to anyone who is.  The fundamentals needed to get the logical router working in an OSPF area are here: protocol and forwarding addresses, definition of the OSPF area, and mapping of the area to an interface:

Screenshot 2014-09-19 20.50.25

Adding an area we enter an ID, select a type (normal or NSSA – RFC 1587 “not so stubby area” for redistributing external BGP routes into OSPF) and an authentication method (MD5, password or none) as well as the authentication value (password or MD5 hash):

Screenshot 2014-09-19 20.50.34

Once the Area is setup, we map it to an interface. Option here to ignore the interface MTU as well as advanced options to control protocol intervals, and set both priority and cost:

Screenshot 2014-09-19 20.50.42

OSPF has our internal needs covered, so let’s move on to BGP to cover our external inter-org routing requirements.  Once again, if you know BGP this is familiar territory. Up top we enable the protocol and assign our AS (Autonomous System Number – the identifier by which BGP peers identify each other and associate IP ranges with an endpoint).  We also add our Neighbors – BGP peers with whom we are establishing a BGP routing relationship:

Screenshot 2014-09-19 20.50.53

Peer configuration requires knowing a bit about your neighbor obviously.  The remote organizations AS number is of course the starting point along with assigned IP address, as well as protocol and forwarding IP addresses.  We can also enter timings and weightings and assign mutual authentication password.  Once the foundation has been laid, we can also optionally add BGP filters:

Screenshot 2014-09-19 20.51.04

Adding a filter we set a direction (ingress or egress) and an action (permit/deny) on a specific network expressed in CIDR block format.  We can also use IP prefix conditionals (GE – greater than or equal to, LE – less than or equal to) to apply to a range:

Screenshot 2014-09-19 20.51.13

We’ve got internal routing.  We’ve got external routing.  Let’s link em!  The Route Redistribution tab let’s us do just that:

Screenshot 2014-09-19 20.51.24

First we establish the IP prefixes for route redistribution. Name and CIDR notion network definition:

Screenshot 2014-09-19 20.51.27

Next we create the redistribution criteria.  Select the prefix (network defined above) and then set direction.  The “learner” protocol is where the route is being distributed, the “learning from” entry is where the route is originating.  Origination can be OSPF, BGP, static or directly connected networks.  Destination can be OSPF or BGP:

Screenshot 2014-09-19 20.51.33

Last but not least we have bridging.  Yes, this appliance can be a proper Ethernet bridge as well giving us fantastic layer 2 options for scenarios that need them.  First step on the bridging tab is to add a bridge configuration:

Screenshot 2014-09-19 20.51.40

Very easy configuration: add a name for the new bridge group and the two port groups that are being bridged:

Screenshot 2014-09-19 20.51.46

 

As you can see there are a huge number of virtual guest environment use cases that can be covered with the rich set of capabilities represented by both the NSX Edge and Logical router.  Next entry we’ll spend some time considering possible architectures that would be difficult before NSX, but become simple once it has been deployed.  Stay tuned!


There is still work to be done on NSX, but I got a number of inquiries asking about how I have the lab server setup from a networking perspective so I thought it would be useful to have a brief intermission and take a look.  Let’s start with a picture:

From the hardware perspective, here is how it breaks down:

  • Core Switch: Netgear GSM7224V2 – this is a fully managed layer 3 switch with 802.1Q VLAN support, 24 1Gb/s ports, 2 SFP+ modules, LAG/LACP and obviously routing
  • Physical Host:  Dell T620 – the beast is setup with 192GB ECC LVDDR3-1333 DRAM, 2 x Intel® Xeon® E5-2650L v2 (1.7GHz/10-core/25MB/8.0GT-s QPI/70W), 2 x 750W PSU, 8 x 2TB Western Digital Red NAS drives, PERC H710 RAID controller with 512MB cache, iDRAC Enterprise ILO board, 2 x 120GB Intel SSD, 4 port Intel i350 1Gb/s NIC, 2 port Broadcomm BCM57810 10Gb/s NIC
  • Firewall:  these days I actually run a dedicated physical firewall in the shape of the (now defunct) McAfee UTM SG720.  It’s no ASA, but it’s actually surprisingly powerful and capable for perimeter defense in a home lab.

Of course hardware porn aside, from a networking perspective, the key statistics above are the 6 1Gb/s ports (no 10Gb/s in the lab unfortunately, so the Broadcomm gets to be bored doing 1Gb/s duty).

In terms of logical configuration, I have allocated the NIC’s to 5 discrete virtual standard switches:

  • vSwitch0: This is the primary VSS and has been allocated two physical ports.  It hosts the following port groups:
    • VM Network: the attach point for any VMs running on the physical host – 192.168.2.0 (VLAN 200)
    • Management Network (vmkernel): primary management network used for management traffic and VM FT – 192.168.5.0 (VLAN 500)
  • vSwitch4: This VSS is dedicated to storage and has one vmkernel attach.  Storage and VMotion traffic traverse this link – 192.168.2.0 (VLAN 200)  Note that it shares the VM network.  My two NAS devices each only have 2 gigabit ports and connect directly to both my client network (192.168.1.0) and the lab (192.168.2.0).  They also need to be accessed by the guest VMs constantly.  Rather than put a routing boundary in the middle, I opted to just flatten out storage access to VLAN200
  • vSwitch1: VSS1 is dedicated to the first nested ESX environment.  This environment contains 3 vESXi guests which live in the same vCenter as the physical host (vCenter 1)
  • vSwitch2: VSS2 is dedicated to the second nested ESX environment.  This environment contains 3 vESXi guests which live in their own vCenter (vCenter 2).  SRM is up and running between the two vCenters
  • vSwitch3: VSS3 serves as a DMZ as well as the provider network (ext network) for vCD and NSX.  It is 192.168.99 (VLAN990) and uplinks to a firewall managed DMZ

In terms of VMware advanced networking (vDS, vCD, vCNS, NSX), I limit this to the nested environments.  It makes configuration changes (including full teardown) super easy even if the entire network traffic flow picture gets (pretty damn) confusing.  Some things to remember about doing this:

  • Enable promiscuous mode on the vSwitch the nested ESX guests attach to
  • Allow forged retransmits on the same vSwitch
  • In the guest properties be sure to select ESX as the actual OS and enable chipset virtualization passthrough support

The reason for this is that normal vSwitch behavior is to assume that a guest is only responsible for itself (meaning traffic destined for the guest OS is actually destined for applications on the guest OS).  In the case where the guest is actually a nested ESX host, the traffic is originating from its guests which have their own vNICs and MAC addresses.  Any traffic inbound to the netsted ESX guest is actually headed for an application in one of its guests.  As such the vSwitch sees lots of what appear to be alien MAC addresses heading for the nested ESX guest that it will want to drop.  These settings prevent that from happening and unlock hypervisor on hypervisor potential.


Over the past two entries we have gone from being NSXless to having a full NSX foundation laid in a fairly painless set of steps.  Next it is time to actually start to use the technology for something interesting.  The true power of overlay network comes from two key areas: agility in the creation and management of layer 2 domains and the collapse of higher layer capabilities into the compute plane.  We’ve seen the former in action with VXLAN and the way NSX leverages the VXLAN foundation to build managed dynamic L2 environments.  The next piece is layer 3 and above.  Being able to actually route, load balance, filter and intelligently direct traffic within the hypervisor enables enormously powerful consumption models.  Past versions of VMware vShield Manager provided a simple “Edge” device which had load balancing, firewall, NAT, static routing, VPN (IPSEC and SSL) and DHCP capabilities.  It was fairly similar to a virtualized version of a high-end home office firewall/router appliance.  There were neat bells and whistles like high availability with very smart failover and the ability to have up to 10 interfaces for guest network usage.  It also came in a host of sizes based on load and throughput requirements so it was resource efficient.  So why change it?  Well the good news is that NSX provides an additive experience.  The traditional vShield type Edge is still available in NSX, but vastly improved.  In addition, NSX provides the ability to deploy a proper virtualized router.  A device which can actually participate in OSPF domains!  That’s great stuff and is a capability of both the Edge appliance as well as the dedicated logical router appliance which is a subset of the Edge functionality plus bridging which we’ll explore in the next entry.  For now let’s get started first by creating a Edge device.

As with all NSX operations, we initiate from the Network & Security plugin.  This time in the left hand menu pane we’re selecting “NSX Edges”.  One interesting footnote; I actually lost my NSX plugin in the web client and nothing seemed able to bring it back.  Skipping right to the resolution, the culprit actually turned out to be a stalled Windows update to .NET.  Once I got Windows fully updated and WU healthy, vCenter magically got itself back into shape (following a final reboot).  The moral of this story, to me at least, is that we really need a containerized version of vCenter running direct on hypervisor.  Anyhow, enough of that.  From NSX Edges, we’re going to click the green plus sign in order to add one.  The New NSX Edge dialogue gives us a few interesting options right off the bat.  First, we can see the traditional Edge Services Gateway (which we’re selecting this round).  Below it, however, we can see this new construct the “logical router”, as discussed above.  We will deploy one of those as well.  Lastly we can see the option to deploy the Edge VM in an HA state.  I’m leaving this deselected for the lab as resource usage is more important than availability.  The last step is to provide both a descriptive name and hostname for the VM, then click Next:

Screenshot 2014-09-17 01.35.54

Next up is to set the appliance password.  Note the password policy is very strong here.  12 characters, upper and lower case mixed, numeric and at least one special character.   A pain for the lab, but a good practice for production anyhow:

Screenshot 2014-09-17 01.36.14

With the password set we move on the the deployment options.  Select a datacenter to deploy the VM into as well as a size.  Size determines the number of vCPUs which will be allocated as well as the RAM.  Obviously the larger the VM, the higher the volume of traffic it can process. Common use cases for the larger sizes would be a high number of IPSEC tunnels or an extremely complex firewall ruleset.  There is also an option to turn off automatic generation of control plane traffic flow service rules.  This is a case where this should only be selected if a specific design and implementation requires control beyond what automatic generation can provide.  Last step is to add resourcing info for the Edge appliance VM:

Screenshot 2014-09-17 01.37.01

Select a cluster, datastore and (optionally) a host.  Note, deploying to vSAN again just to show off!

Screenshot 2014-09-17 01.37.31

The next step is where the real magic begins.  Here we are creating and configuring the network interfaces of the Edge appliance.  If you think about what we’re doing here from the perspective of legacy network engineering, it really is amazing.  Through an easy wizard driven GUI, we’re literally creating and addressing network uplinks.  Extremely cool.  Each interface is classified as either “Internal” or an uplink “external” and should have corresponding connectivity which matches.  I point internal interfaces towards the NSX logical vswitch (VXLAN vwire environment) that the guest VMs will attach to, and external interfaces at a port group that has a physical route path out of the lab network.  In this respect the new edge is very much like the vShield Edge in a vCloud Director scenario, where internal interfaces would be connecting to a tenants organizational network while external interfaces would be connecting to the provider external network.  After selecting the type of interface, provide it a name and then set its vswitch connectivity.  The last step is to provide an IP address for the new interface.  Of course this IP should be valid for the vswitch and port group the interface is being connected to:

Screenshot 2014-09-17 01.38.51

With all options complete, we can now add another interface.  There should be at least one internal and one external if the guest VMs will need to reach outside of the overlay network:

Screenshot 2014-09-18 00.25.43

With both interfaces created and configured, we can move forward to the next step:

Screenshot 2014-09-18 00.25.52

Since this is a gateway device, we should provide it with a default gateway (although this is optional).  Select the appropriate interface and provide the IP of the next hop router on that subnet:

Screenshot 2014-09-18 00.26.04

The last step is an opportunity to create a default firewall policy.  Very useful for setting baseline security so the new appliance comes up configured.  HA parameters can also be set in this dialogue box if the HA option was selected up top:

Screenshot 2014-09-17 01.45.28

With all steps completed it is time to review and submit!

Screenshot 2014-09-17 01.45.32If everything is correctly configured, and there is sufficient host resourcing available to support he creation of the configured Edge appliance VM size, the appliance will deploy and come online:
Screenshot 2014-09-18 00.33.56

We’ve got a working Edge, so let’s see what it can do!  For anyone familiar with the vShield Edge, this will be semi-familiar territory, but there is also a ton of new capability.  Doubleclick on the newly created Edge device object to bring up the configuration page.  The first stop is to head over to the manage tab.  Look at all of those groupings!  There are separate config hierarchies for Firewall, DHCP, NAT, Routing, LBS, VPN, SSL VPN and grouping which makes things very intuitive.  Let’s start with the top level Settings group.  First up is the Configuration page.  Here we can modify the syslog configuration and logging options for the appliance.  We can also check on what services have been enabled at a glance.  There are also sections to modify both the HA configuration and the DNS settings.  Finally, we can deploy a new appliance from this panel as well:

Screenshot 2014-09-19 22.03.36

Moving one level down we arrive at the configuration page for the interfaces.  Here we can see the aforementioned 10 available interface slots, two of which we configured during the deployment steps.  We can modify or delete those, as well as add new ones:

Screenshot 2014-09-19 22.03.46

The final configuration area under Settings is for certificate management.  This appliance reaches up to layer 7 and also supports VPN, so it is likely that it will need to be configured with one or more public certs.  This panel makes complex cert management very easy:

Screenshot 2014-09-19 22.03.58

The next top level configuration grouping is for the Firewall.  Very clean presentation with all rules listed in tabular format.  Click the green “+” to create a new rule (providing the expected source, destination, service and action values) or delete or modify existing ones.  They are processed in order and can be moved.  Keep in mind that the bottom rule will “catch all”, but only the first rule that matches a traffic pattern will be applied (in other words a higher level “permit” will take precedence over a lower level “deny”, but would be rendered superfluous by a higher level deny), so plan rule strategy accordingly:

Screenshot 2014-09-19 22.04.06

The next settings group is for the DHCP server.  I can’t stress enough the utility of this option.  When you consider software defined datacenter strategy, and the automated deployment and configuration of customer environments, having a way to bring guest OS instances onto the network before the first one is deployed is extremely powerful.  Being able to manage (and orchestrate) that capability right in the network edge device is an even bigger bonus.  The first stop is the Pools config block and the options here are very straightforward for anyone familiar with DHCP.  You can enable the service, configure logging and create scopes (IP ranges that the DHCP server will service):

Screenshot 2014-09-19 22.04.13

With the pools defined we can view and configure the Bindings.  Bindings in this context are static assignments.  What this means is that the DHCP server can actually be prepopulated with IP associations by VM ensuring that a specific guest instance will get a specific IP:

Screenshot 2014-09-19 22.04.20

Next up is the NAT configuration.  As an edge device, this section is critical. The rules come in two flavors, SNAT and DNAT.  SNAT are source NAT rules which are for egress.  The translate private internal IP address to the outbound gateway uplink address.  DNAT are destination NAT rules which are for ingress.  They are applied to one of the external gateway IP addresses and translate a specific inbound port to an internal address (changing the port as well if needed).  And of course it goes without saying that in order to NAT traffic and have it flow, you also need corresponding firewall rules that permit it.  The top level UI is very minimalist, click the green “+” to create a rule as usual.

Screenshot 2014-09-19 22.04.27

Here we see the options for a DNAT.  We have the original IP range and protocol (TCP or UDP), as well as the original port range.  Corresponding configuration must also be provided for the translation side of the equation – both IP and port range:

Screenshot 2014-09-19 22.04.44

SNAT is simpler.  Set the interface the rule is being applied to and provide both an original IP range and a translated IP range to start NAT’ing internet traffic out:

Screenshot 2014-09-19 22.04.52

The Routing section is where things get really interesting and the true power of the new NSX flavored Edge is unlocked. It starts off with top level configuration.  Set a default gateway for the router itself, if appropriate, and then enable the dynamic routing options. OSPF and even BGP(!) are supported.  This is fantastic as, with these two protocols, 80% of both internal and external integration cases are covered.  We can also configure logging in this section:

Screenshot 2014-09-19 22.09.34

In the event that the logical router is being deployed into an environment without dynamic routing, static routes can still be created.  Once again intuitive, Interface, network, next hop, MTU (powerful – per route MTU, this is fantastic), and of course a description field:

Screenshot 2014-09-19 22.09.49

The OSPF tab is a bit overwhelming for anyone not familiar with the protocol, but will look like home to anyone who is.  The fundamentals needed to get the logical router working in an OSPF area are here: protocol and forwarding addresses, definition of the OSPF area, and mapping of the area to an interface:

Screenshot 2014-09-19 22.10.04

Adding an area we enter an ID, select a type (normal or NSSA – RFC 1587 “not so stubby area” for redistributing external BGP routes into OSPF) and an authentication method (MD5, password or none) as well as the authentication value (password or MD5 hash):

Screenshot 2014-09-19 22.10.08

Once the Area is setup, we map it to an interface. Option here to ignore the interface MTU as well as advanced options to control protocol intervals, and set both priority and cost:

Screenshot 2014-09-19 22.10.13

OSPF has our internal needs covered, so let’s move on to BGP to cover our external inter-org routing requirements.  Once again, if you know BGP this is familiar territory. Up top we enable the protocol and assign our AS (Autonomous System Number – the identifier by which BGP peers identify each other and associate IP ranges with an endpoint).  We also add our Neighbors – BGP peers with whom we are establishing a BGP routing relationship:

Screenshot 2014-09-19 22.10.22

Peer configuration requires knowing a bit about your neighbor obviously.  The remote organizations AS number is of course the starting point along with assigned IP address, as well as protocol and forwarding IP addresses.  We can also enter timings and weightings and assign mutual authentication password.  Once the foundation has been laid, we can also optionally add BGP filters:

Screenshot 2014-09-19 22.10.26

Adding a filter we set a direction (ingress or egress) and an action (permit/deny) on a specific network expressed in CIDR block format.  We can also use IP prefix conditionals (GE – greater than or equal to, LE – less than or equal to) to apply to a range:

Screenshot 2014-09-19 20.51.13

IS-IS is an internal, link state based, routing protocol.  An alternative to OSPF, the key difference is that while OSPF was built as a pure layer 3 control plane protocol, IS-IS starts with a layer 2 view of its Intermediate Systems.  As such it is a core component of the various IEEE advanced bridging protocols: 802.1ad, q and h.  This is an extremely powerful option to have here.  If you consider integrating with carrier stretched layer 2 topologies (like VPLS), ability to support the 802.1a family protocols (Shortest Path Bridging, Provider Bridging and Provider Backbone Bridging) can spell the difference between being able to actually participate in the extended L2 domain vs having the virtual network environment relegated to its own L3 domain (and consequently new IP space).  It is also a solution for eliminating the need for yet another level of overlay abstraction, the SSL VPN or IPSEC TAP VPN which, while still available as options, create additional overheard.   The base UI for configuring IS-IS allows us to configure a system id and Intermediate System type, create Areas, and map them to an interface:

Screenshot 2014-09-19 22.10.38

Creating IS-IS Areas is easy:

Screenshot 2014-09-19 22.10.46

Interface binding follows the same convention as BGP:

Screenshot 2014-09-19 22.10.52

We’ve got internal routing with OSPF.  We’ve got external routing with BGP.  Let’s link em!  The Route Redistribution tab let’s us do just that:

Screenshot 2014-09-19 22.10.59

First we establish the IP prefixes for route redistribution. Name and CIDR notion network definition:

Screenshot 2014-09-19 22.11.03

Next we create the redistribution criteria.  Select the prefix (network defined above) and then set direction.  The “learner” protocol is where the route is being distributed, the “learning from” entry is where the route is originating.  Origination can be OSPF, BGP, static or directly connected networks.  Destination can be OSPF or BGP:

Screenshot 2014-09-19 22.11.08

Phew.  The routing configuration was intense!  In an upcoming entry I plan to talk through various use cases that were once sealed off which NSX can unlock and the key to many of those is in the power of these routing capabilities.  For now though, let’s move on to the Load Balancer configuration.  Up top are the basic service controls plus options for logging, “Acceleration”. and “Service Insertion”.  These last two require some explanation.  “Acceleration” refers to the load balancing engine that will be activated in the appliance.  Toggling this option switches between the faster Layer 4 engine (which obviously makes decisions based on TCP connection state) and the slower, but far more flexible, Layer 7 engine which enabled the ability to make decisions at the application layer.  Obviously the right choice here is completely dependent on use case.  “Service Insertion” allows the Load Balancer to integrate with third party appliance solutions:

Screenshot 2014-09-19 22.11.16

The next configuration group is “Application Profiles” which is where the L7 and L4 rules engines are configured.  The bottom pane allows certificate configuration.  Absolutely vital when working at the application layer where much traffic will be SSL/TLS:

Screenshot 2014-09-19 22.11.25

Fantastic options here for defining an Application Profile.  Protocol obviously; TCP for L4, HTTP and HTTPS for L7.  HTTP redirect is fully supported and a URL can be entered here.  The ability to determine pathing and redirect via URL is critical for an application focused load balancer.  Persistence and persistence mode can be set and a cookie name provided for cookie based persistence.  In addition to these options, there is a toggle for enabling an “X-Forward-for-HTTP” flag into the forwarded header.  This option is for support of proxy environments.  The field is set to the actual originating IP so the load balancer can make decisions based on true source if desired.  Without this field, in a proxy environment, the IP of the proxy will be seen as the source.  Finally, comprehensive configuration for certificate assignment, auth method and cipher can be set here as well:

Screenshot 2014-09-19 22.11.31

Application Profiles now created, we can move on to the Service Monitoring configuration.  Here we can create monitors based on protocol and set timing intervals to govern load balancer listening behavior:

Screenshot 2014-09-19 22.11.38

An example of creating a Layer 7 service monitor.  HTTP method can be set as well as a specific URL to watch.  In the “Expect” field we enter the literal string that indicates a match in the status line of the HTTP response.  Next we select the “Method” to be used to detect server status.  URL is the URL to be used in the sample request.  Next, if the method is set to POST, comes the data that should be sent to that URL.  For “Receive” we enter the expected response. In the “Expect” field we enter the expected response. If it is not matched the monitor does not try to match the Receive content.  Finally, in the option extension area, we can enter additional monitoring parameters as key/value pairs.  These are predefined (example: warning=10 sets the load balancer to trigger a warning on the service endpoint if a response is not received within 10 seconds).

If you haven’t worked with advanced load balancers before, this may be a bit confusing but if you think through it it’s actually very straightforward.  The point of a load balancer is to provide a single front end to a group of servers in scenarios where the application can “scale out”.  So using a web server as an example, the name and IP address of the “server” are quite likely the logical server represented by the virtual IP of the load balancer.  Behind the load balancer sit any number of actual web servers that handle the traffic.  The load balancer, to do its work, needs to be able to do two things.  First is decide how to distribute traffic, and second is to determine how many servers it is representing.  The first one comes down to load balancing method selection.  It might be a simple round robin which treats the known servers as a list, or it could be as complex as a hash on the originating IP which matches clients to servers based on layer 3 network associations.  Server members and health similarly can be accomplished by a number of methods.  The service monitor capability discussed above represents one of the more advanced ones.  In this case the load balancer will literally have a layer 7 relationship with its member servers and use a URL connect / URL response to determine if the servers are alive:

Screenshot 2014-09-19 22.11.43 Screenshot 2014-09-19 22.11.51

Screenshot 2014-09-19 22.11.55 Screenshot 2014-09-19 22.12.02

Screenshot 2014-09-19 22.12.07 Screenshot 2014-09-19 22.12.13

Screenshot 2014-09-19 22.12.18 Screenshot 2014-09-19 22.12.31

Screenshot 2014-09-19 22.12.37 Screenshot 2014-09-19 22.12.45

Phew!  That’s a ton of options and a really broad range of capabilities!  This is a good place to break for now.  Next up we will create a logical router!

 

 


Last entry we got the NSX Manager up and running in vCenter after a quick overview of the technical requirements.  Next up it is time to actually implement the SDN in our environment.  The first step is to login to the vCenter web client and select the Network & Security solution from the Home tab.  If you recall the rundown of NSX components from the last entry, our next task is to install the hypervisor level integration.  To do this we need to prepare the hosts.  This is similar to implementing VXLAN and results in a VIB being installed.  Click on the Install option in the menu pane, then select the Host Preparation tab:

Screenshot 2014-09-16 12.06.06

Under Host Preparation, we can see any clusters in our vCenter configured with a vDS.  In our case there can be only one!

Screenshot 2014-09-16 12.06.14

Clicking the “Install” hyperlink in column 2 will trigger the install after a quick confirmation:

Screenshot 2014-09-16 12.06.25

The Manager will start the download of the VIB to the hosts, and trigger the scripted install.  All of the usual automated workflow orchestration for VIB installation applies.  Lots of things can trip this part up, mostly attributable to host or network misconfiguration.  Our environment is sparking clean so we have nothing to worry about!

Screenshot 2014-09-16 12.06.33

Working away at each host in parallel…

Screenshot 2014-09-16 12.06.51

And just like magic we…. Wait… What the heck is this?!  Hmmm… So much for our clean environment!  Looks like it failed. Luckily there is a handy “Resolve” hyperlink.  Let’s click it.

Screenshot 2014-09-16 12.07.11

A bit more thinking and POOF!  As if by magic we’re good.  So what the heck happened here?  Well in some cases it seems that the install actually requires a host reboot.  The workflow triggered by resolve will perform this reboot so be aware of that before clicking.  It should probably mention this when triggered, but the good news is HA/DRS is there for just such a situation, right?  Well I’m not sure because I can’t be 100% certain that it staggered reboots.  In any event, maintenance mode is probably a great idea when doing massive configuration changes like migrating to SDN!  And in any event it worked so all is well…

Screenshot 2014-09-16 15.18.08

 

Notice there is a hyperlinked “Configure” next to our cluster?  These are great UI clues in the NSX manager.  Go ahead and click configure to prepare the VXLAN configuration.  There are a few things we need to enter here.  We select our vDS under Switch, enter the VLAN id of the transport VLAN (if applicable), the MTU size of the VXLAN uplink vmkNIC (note – 1600), select a vmkNIC IP addressing scheme – we are going to switch this to IP pool in a second, select a vmkNIC Teaming policy (I chose failover, remember that EtherChannel must have matching switch config on the physical uplink switch) and lastly enter a VTEP id:

Screenshot 2014-09-16 20.57.57

Creating a new IP Pool for VXLAN use is easy.  Simply provide a name and IP subnet info, as well as a range:

Screenshot 2014-09-16 20.58.40

Here we can see the completed VXLAN config dialogue:

Screenshot 2014-09-16 20.59.00

With the configuration applied, we now see additional details populated for the cluster including VTEP id and failover policy:

Screenshot 2014-09-16 20.59.35

 

VXLAN is up and running in NSX, so let’s go ahead and finish off the config set.  Click on SegmentID to create the SegmentID pool which will be used by NSX to allocate to VXLAN for the creation of vwires (dynamic VXLAN layer 2 domains).  We can also configure multicasting here.  Select a numerical range for the pool starting at 5000.  I selected 5000-5999 and left multicasting off:

Screenshot 2014-09-16 20.45.08

The last step in this config block is to setup the NSX Transport Zone.  Provide a name and select a mode for the Control Plane interaction. I select unicast which works well in a lab setting where scale isn’t a big deal.  In this case the control plane will be entirely managed by the NSX Controller.  Alternatively the control plane activity could be offloaded to the physical network via multicast.  The last option is a hybrid where local traffic replication is offloaded.  The hybrid is probably the best match for production scenarios because of it balance of control efficiency and scalability.  The last step is to add the prepared cluster to the transport zone:

Screenshot 2014-09-16 21.00.02

Here we can see the transport zone successfully added to the NSX configuration:

Screenshot 2014-09-16 21.00.09

Next up is deploying the actual overlay network or “logical switches” in NSX terminology.  Heady stuff!  There is a pretty daunting list of pre-requisites in order for this process to work correctly.  I’ve copied them directly from the implementation guide for reference and I will talk through each one because they require explanation:

  • You must have the Super Administrator or Enterprise Administrator role permission to configure and 
    manage logical switches: This one is a no brainer.  Have right permissions before configuring.  Basically you need super user for this.
  • Network virtualization components must be installed on the clusters that are to be part of the logical switch: make sure that the hosts have been prepared (the above procedure)
  • You have the minimum required software versions: this is standard stuff.  Make sure that the version compatibility matrix is green between vSphere/vCenter/NSX
  • Physical infrastructure MTU is at least 50 bytes more than the MTU of the virtual machine vNIC: this one is trickier.  So the physical infrastructure MTU we can figure out.  In virtualization, we can look at any flavor of vswitch as utilizing the actual physical NIC of the host as uplinks.  So a given vSwitch has N virtual ports connected to virtual NICs connected to virtual machines, but also has ports connected to host vNICs which actually map to real physical links.  The pre-req here is to ensure that the Maximum Transmission Unit size on the physical NIC is 50 bytes larger than the VM vNIC.  In our case the “physical” NIC is really the vNIC on the ESXi guest VM since we are nested.  To check that we go to the host configuration and actually edit the settings of the VMkernel adapter under Networking (not the physical adapter):

Screenshot 2014-09-16 16.34.33

 

Our MTU is set to 1500.  That doesn’t bode well.  1500 byte MTU is standard so almost certainly the “virtual machine vNIC” MTU is also set to 1500.  Of course now it’s just a matter of figuring out just what the “virtual machine vNIC” refers to!  To understand the answer to this it is important to understand how an overlay network really works.  Consider this diagram:

DayLifeOverlayPacket_1

The easiest way to wrap ones mind around overlay networking is to walk through a “day in the life of a packet”.  Remember that the guest OS has no clue that it’s being virtualized (for the most part, but close enough for this discussion).  It simply formulates Ethernet frames and sends them through the NIC driver.  An Ethernet conversation, of course, starts with an ARP broadcast to find the destination Ethernet address associated with the IP address you’re attempting to connect to.  This ARP query is processed by the vNIC the way a physical NIC would and it is put “on the wire”.  Of course in this case “on the wire” means on the virtual switch hosted by the hypervisor.  If the destination address exists within the ARP table of the vswitch (meaning a VM also attached to the same vswitch and running on the same host), then the ARP query is passed to that host and the conversation never leaves the hypervisor.  If this is not the case however (which means the destination VM lives on another host – common in a vDS environment even if the guests are on the same logical network), then the frame is sent down toward the physical NIC which is acting as the vSwitches uplink.  In an overlay scenario the frame is intercepted by the handler before it gets to the physical NIC driver on the host.  This is why for VXLAN we have to install a VIB.  The handler catches the frame and then handles it.  This means using its own logic to determine where the frame should go and then send it there.  In the case of VXLAN this means over a Virtual Tunnel Endpoint (VTEP) to the correct VXLAN destination over layer 3.  That is where the encapsulation comes into play.  So we are taking an entire 1500 byte Ethernet frame and packing it into another one to send over layer 3.  And this is where the larger MTU comes into play.  Using a 50 byte larger MTU ensures we don’t have to fragment every time an overlay frame is sent.  So what the pre-req is referring to is to set the physical MTU to 50 bytes larger than the VXLAN MTU (the “virtual machine MTU”) All documentation, however, really recommends setting it to 1600.  I feel the documentation here should have been more clear as “virtual machine MTU” is pretty ambiguous, but there it is.  Also worth noting is “logical switch” in NSX parlance is actually referring to “VXLAN”.  So with all of this in mind, we can go ahead and change that physical MTU of the VMKernel NIC that is attached to the vDS to 1600.

  • Managed IP address is set for each vCenter server in the vCenter Server Runtime Settings. See vCenter
    Server and Host Management: This is a straightforward vCenter config option found in the vCenter properties:

Screenshot 2014-09-16 17.07.54

  • DHCP is available on VXLAN transport VLANs if you are using DHCP for IP assignment for VMKNics:  this one catches me all the time as I don’t use a DHCP server on the transport VLAN.  If you don’t you need an IP pool or the configuration will break since vNICs will get an autoconfig address (169.254).  As we will see later, we’ll have an opportunity to associate an IP pool if we don’t want to deploy DHCP in the transport VLAN.
  • A consistent distributed virtual switch type (vendor etc.) and version is being used across a given transport zone. Inconsistent switch types can lead to undefined behavior in your logical switch: this is straightforward – you must use either vDS or OpenvSwitch (for example)
  • 5- tuple hash distribution should be enabled for Link Aggregation Control Protocol (LACP):  this one is the prescribed distribution algorithm that you should use if you are aggregating vSwitch uplinks using LACP.  In our case we are not using LACP, but in cases where it applies this is critical

With the background detail on the pre-requisites in mind, we can move forward with the next step which is Deploying the NSX Controller Node.  Head back over to the Installation section of the Network & Security plugin and select Management.  Here we can click “+” to add our first Controller under NSX Controller Nodes:

Screenshot 2014-09-16 17.18.48

 

We have a bunch of questions to answer to configure our first controller.  NSX Manager obviously refers to the NSX Manager we are pairing with, created in our first entry.  Datacenter should be set to the vDC we are supporting.  Cluster Resource Pool refers to the HA/DRS cluster we are NSX enabling.  Datastore is the datastore where the controller VM should be created (note in this case we’re installing to a vSAN datastore – more on that later) and host is the host on which it should be instantiated.  Connected to refers to the network to which the controller VM should attach while IP pool is how the node will be addressed.  Finally password sets the admin password for the controller appliance.

Screenshot 2014-09-16 17.23.04

A quick shot of the IP pool configuration.  Easy stuff:

Screenshot 2014-09-16 17.39.15

With everything configured for the Controller VM setup we can go ahead and click OK to create it.  The workflow will trigger and start operating:

Screenshot 2014-09-16 17.23.17

 

So how did it go?  Well it didn’t.  The workflow completed and the NSX Controller Nodes list stayed empty.  Recent tasks indicated an extremely generic error of “No hosts is compatible with the virtual machine”.  Hmmm.  Not super helpful:

Screenshot 2014-09-16 17.54.10

To get a deeper look, it’s time to SSH into the NSX Manager.  Hurray!  First we need to enable it, so head to the VAMI UI.  From the Summary tab we can easily spot the SSH Service and a handy “Start” button:

Screenshot 2014-09-16 17.53.11

 

With SSH running we can head to the CLI and check the log with the command:

show manager log follow

It’s a good idea at this stage to re-run the new Controller workflow to trigger the error again.  This is what I captured as the workflow log:

2014-09-16 23:07:13.070 GMT INFO http-nio-127.0.0.1-7441-exec-2 ControllerServiceImpl:422 - about to create controller: controller-3 IP =192.168.2.16
2014-09-16 23:07:13.079 GMT INFO http-nio-127.0.0.1-7441-exec-2 AuditingServiceImpl:141 - [AuditLog] UserName:'vsphere.local\administra tor', ModuleName:'VdnNvpController', Operation:'CREATE', Resource:'null', Time:'Tue Sep 16 23:07:13.077 GMT 2014'
2014-09-16 23:07:13.085 GMT INFO DCNPool-2 VirtualWireInFirewallRuleNotificationHandler:59 - Recieved VDN CREATE notification for conte xt controller-3:Controller
2014-09-16 23:07:13.086 GMT INFO DCNPool-2 VirtualWireDCNHandler:43 - Recieved VDN CREATE notification for context controller-3:Control ler
2014-09-16 23:07:13.207 GMT INFO http-nio-127.0.0.1-7441-exec-2 TaskServiceImpl:101 - TF:Created Job with ID jobdata-3535
2014-09-16 23:07:13.221 GMT INFO http-nio-127.0.0.1-7441-exec-2 TaskServiceImpl:399 - TF:Scheduling Job jobdata-3535
2014-09-16 23:07:13.393 GMT INFO http-nio-127.0.0.1-7441-exec-4 UserSessionManager:43 - New session: XXXXXXXXXXXXXXXXXXXXXXXXXXX7898D
2014-09-16 23:07:13.464 GMT INFO http-nio-127.0.0.1-7441-exec-5 UserSessionManager:43 - New session: XXXXXXXXXXXXXXXXXXXXXXXXXXX6DEF2
2014-09-16 23:07:13.989 GMT INFO pool-9-thread-1 ImmediateScheduler:34 - TF:Schedule Now Job ID jobdata-3535
2014-09-16 23:07:14.002 GMT INFO taskExecutor-7 JobWorker:246 - Updating the status for jobinstance-13626 to EXECUTING
2014-09-16 23:07:14.059 GMT INFO taskScheduler-33 DeployOvfTask:173 - Deploying VM 'NSX_Controller_bc0ed3c4-5182-4448-af0c-dcb46eec3e9f ' using the OVF file.
2014-09-16 23:07:14.083 GMT INFO taskScheduler-33 OvfInstaller:335 - Resource pool id = 'resgroup-84'
2014-09-16 23:07:14.084 GMT INFO taskScheduler-33 OvfInstaller:336 - Datastore id = 'datastore-545'
2014-09-16 23:07:14.084 GMT INFO taskScheduler-33 OvfInstaller:339 - Host id = 'host-543'
2014-09-16 23:07:14.090 GMT INFO taskScheduler-33 OvfInstaller:141 - vApp candidate, Type = 'ResourcePool', Id = 'resgroup-84'
2014-09-16 23:07:14.095 GMT INFO taskScheduler-33 OvfInstaller:141 - vApp candidate, Type = 'ClusterComputeResource', Id = 'domain-c83'
2014-09-16 23:07:14.101 GMT INFO taskScheduler-33 OvfInstaller:141 - vApp candidate, Type = 'Folder', Id = 'group-h23'
2014-09-16 23:07:14.107 GMT INFO taskScheduler-33 OvfInstaller:141 - vApp candidate, Type = 'Datacenter', Id = 'datacenter-21'
2014-09-16 23:07:14.112 GMT INFO taskScheduler-33 OvfInstaller:141 - vApp candidate, Type = 'Folder', Id = 'group-d1'
2014-09-16 23:07:14.114 GMT INFO taskScheduler-33 OvfInstaller:359 - OVF is not being imported under a vApp and a folder has not been s pecified. Trying to associate with the root VM folder of the data center.
2014-09-16 23:07:14.460 GMT INFO taskScheduler-33 OvfInstaller:174 - Datacenter VM folder name = 'vm' id = 'group-v22'
2014-09-16 23:07:14.465 GMT INFO taskScheduler-33 OvfInstaller:105 - Searching for existing VM. Name = 'NSX_Controller_bc0ed3c4-5182-44 48-af0c-dcb46eec3e9f', Search root type = 'VIRTUAL_MACHINE', Search root id = 'resgroup-84'
2014-09-16 23:07:14.742 GMT INFO taskScheduler-33 OvfManagerImpl:120 - Creating OVF import spec.
2014-09-16 23:07:14.812 GMT INFO taskScheduler-33 OvfManagerImpl:122 - Created OVF import spec successfully.
2014-09-16 23:07:14.853 GMT INFO taskScheduler-33 OvfInstaller:498 - Setting value for key 'api_username'
2014-09-16 23:07:14.854 GMT INFO taskScheduler-33 OvfInstaller:498 - Setting value for key 'management_ip'
2014-09-16 23:07:14.854 GMT INFO taskScheduler-33 OvfInstaller:498 - Setting value for key 'keystore'
2014-09-16 23:07:14.855 GMT INFO taskScheduler-33 OvfInstaller:498 - Setting value for key 'api_private_cert'
2014-09-16 23:07:14.855 GMT INFO taskScheduler-33 OvfInstaller:498 - Setting value for key 'api_password'
2014-09-16 23:07:14.855 GMT INFO taskScheduler-33 OvfInstaller:498 - Setting value for key 'gateway_ip'
2014-09-16 23:07:14.856 GMT INFO taskScheduler-33 OvfInstaller:498 - Setting value for key 'cluster_ip'
2014-09-16 23:07:14.856 GMT INFO taskScheduler-33 OvfInstaller:498 - Setting value for key 'api_public_cert'
2014-09-16 23:07:14.857 GMT INFO taskScheduler-33 OvfInstaller:498 - Setting value for key 'netmask'
2014-09-16 23:07:14.857 GMT INFO taskScheduler-33 OvfInstaller:420 - Number of CPU cores set in the OVF import spec = '4'
2014-09-16 23:07:14.861 GMT INFO taskScheduler-33 OvfInstaller:425 - Number of CPU cores supported by the host = '1'
2014-09-16 23:07:14.862 GMT INFO taskScheduler-33 OvfInstaller:427 - Changing the number of CPU cores in the OVF import spec to '1'.
2014-09-16 23:07:14.862 GMT INFO taskScheduler-33 ResourcePoolVcOperationsImpl:320 - Importing VM into the resource pool.
2014-09-16 23:07:14.905 GMT INFO taskScheduler-33 ResourcePoolVcOperationsImpl:322 - Waiting for the HttpNfcLease to be ready.
2014-09-16 23:07:14.928 GMT DEBUG VcEventsReaderThread VcEventsReader$VcEventsReaderThread:301 - got prop collector update, but not for us:ManagedObjectReference: type = PropertyFilter, value = session[fa0b277c-c1cb-5f5c-cc78-b5e1e82a1bc4]5243239f-b58a-1539-609d-4d3e7e451 764, serverGuid = AB462F33-E3E0-4E86-BD55-984E0C95FBE1
2014-09-16 23:07:19.175 GMT INFO ViInventoryThread ViInventory:5004 - Virtual Center: Updating Inventory. new:0 modified:1 removed:0
2014-09-16 23:07:19.199 GMT INFO ViInventoryThread EndpointSVMUpdater:206 - Solution 6341068275337691137 is not registered
2014-09-16 23:07:19.239 GMT INFO ViInventoryThread ViInventory:1304 - 84/164 objects published.
2014-09-16 23:07:19.246 GMT INFO ViInventoryThread VimObjectBridge:943 - VimObjectBridge: Ending inventory update
2014-09-16 23:07:19.247 GMT INFO ViInventoryThread VimObjectBridge:222 - Processing 1 updates and 0 deletions for this transaction
2014-09-16 23:07:19.249 GMT INFO ViInventoryThread VimObjectBridge:229 - VimObjectBridge: Time taken to process transaction : 19
2014-09-16 23:07:19.249 GMT INFO ViInventoryThread ViInventory:1512 - Resolved, last version:220 num vc objs:90 num vimos:164
2014-09-16 23:07:19.683 GMT INFO http-nio-127.0.0.1-7441-exec-3 UserSessionManager:43 - New session: XXXXXXXXXXXXXXXXXXXXXXXXXXX19BEB
2014-09-16 23:07:19.792 GMT INFO http-nio-127.0.0.1-7441-exec-1 UserSessionManager:43 - New session: XXXXXXXXXXXXXXXXXXXXXXXXXXXC1AC3
2014-09-16 23:07:22.341 GMT INFO ViInventoryThread ViInventory:5004 - Virtual Center: Updating Inventory. new:0 modified:1 removed:0
2014-09-16 23:07:22.364 GMT INFO ViInventoryThread EndpointSVMUpdater:206 - Solution 6341068275337691137 is not registered
2014-09-16 23:07:22.400 GMT INFO ViInventoryThread ViInventory:1304 - 84/164 objects published.
2014-09-16 23:07:22.406 GMT INFO ViInventoryThread VimObjectBridge:943 - VimObjectBridge: Ending inventory update
2014-09-16 23:07:22.407 GMT INFO ViInventoryThread VimObjectBridge:222 - Processing 1 updates and 0 deletions for this transaction
2014-09-16 23:07:22.409 GMT INFO ViInventoryThread VimObjectBridge:229 - VimObjectBridge: Time taken to process transaction : 17
2014-09-16 23:07:22.409 GMT INFO ViInventoryThread ViInventory:1512 - Resolved, last version:221 num vc objs:90 num vimos:164
2014-09-16 23:07:22.722 GMT INFO ViInventoryThread ViInventory:5004 - Virtual Center: Updating Inventory. new:0 modified:3 removed:0
2014-09-16 23:07:22.746 GMT INFO ViInventoryThread EndpointSVMUpdater:206 - Solution 6341068275337691137 is not registered
2014-09-16 23:07:22.766 GMT INFO ViInventoryThread ViInventory:1538 - UNResolved, count:1 reason:Did not find child vimo for additional children in cache. By this time all children should have vimos in the cache
2014-09-16 23:07:23.048 GMT INFO ViInventoryThread ViInventory:5004 - Virtual Center: Updating Inventory. new:1 modified:1 removed:0
2014-09-16 23:07:23.072 GMT INFO ViInventoryThread EndpointSVMUpdater:206 - Solution 6341068275337691137 is not registered
2014-09-16 23:07:23.096 GMT INFO ViInventoryThread ViManagedVirtualMachineObject:244 - vnic change for vm-556: null to
2014-09-16 23:07:23.301 GMT INFO ViInventoryThread ViInventory:1304 - 85/165 objects published.
2014-09-16 23:07:23.317 GMT INFO ViInventoryThread VimObjectBridge:943 - VimObjectBridge: Ending inventory update
2014-09-16 23:07:23.318 GMT INFO ViInventoryThread VimObjectBridge:222 - Processing 4 updates and 0 deletions for this transaction
2014-09-16 23:07:23.320 GMT INFO ViInventoryThread VimObjectBridge:229 - VimObjectBridge: Time taken to process transaction : 219
2014-09-16 23:07:23.321 GMT INFO ViInventoryThread ViInventory:1512 - Resolved, last version:223 num vc objs:91 num vimos:165
2014-09-16 23:07:24.222 GMT WARN VirtualMachineDvfilterMonitor-1 VirtualMachineWorkQueue$WorkQueue:255 - Host not found for Vm vm-556, bypassing.
2014-09-16 23:07:24.223 GMT WARN VirtualMachineDvfilterMonitor-1 VirtualMachineWorkQueue$WorkQueue:279 - no host found for vm-556, remo ving.
2014-09-16 23:07:24.443 GMT INFO DCNPool-4 InventoryUtils:273 - Null hostId for VM vm-556
2014-09-16 23:07:25.944 GMT INFO http-nio-127.0.0.1-7441-exec-2 UserSessionManager:43 - New session: XXXXXXXXXXXXXXXXXXXXXXXXXXX73216
2014-09-16 23:07:26.014 GMT INFO http-nio-127.0.0.1-7441-exec-4 UserSessionManager:43 - New session: XXXXXXXXXXXXXXXXXXXXXXXXXXXB56C3
2014-09-16 23:07:26.807 GMT ERROR taskExecutor-18 ErrorCounter:56 - <AST>:0:0: unexpected end of subtree
2014-09-16 23:07:26.810 GMT WARN taskExecutor-18 AbstractActionEventListener:61 - User Identity Action Event Listener: Error happened w hen dispatch action events.

Cool detailed info, but unfortunately not shedding any additional light.  As shown in bold, the story stays the same.  It doesn’t appear that there were any errors leading up to the terminal condition, and according to log NSX now deals with template configuration mismatches elegantly (rescaled the template from 4 vCPUs to 1 to match the host limit).  Of course logs aren’t always exactly correct, right?  As it turns out, the template was attempting to create a 4 vCPU VM on a 1 CPU host.  Luckily with nested adding CPUs is very easy.  A quick reconfig of the ESXi guest VMs and a reboot and the controller configuration completed without a hitch:

Screenshot 2014-09-17 01.29.34

Huzzah!  Well that’s (more than) enough for this entry!  Next up will take a deeper look at NSX implementation up the stack, edge device capabilities, and talk through some use cases.  Stay tuned!


I’ve covered overlay networks and their importance a few times in these pages over the years, but I have to admit that until this week I was never “walking the walk” at the ComplaintsHQ lab.  To set a baseline, NSX comes in two flavors:

  • The first is NSX multi-hypervisor, the old Nicira Open vSwitch, which can be integrated with both vSphere and other competing hypervisors (KVM, Xen).  The catch is that this really is a vSphere play and not so much a vCenter play.  The integration of Open vSwitch replaces the vDS and so must integrate directly with a hosts vSS.  If you already have a vDS infrastructure in place, this requires some significant rearchitecting.
  • The second is NSX-V, or the native VMware flavor of NSX, which is quickly evolving to be the defacto network architecture for VMware and is core to the SDN strategy.  As an example of this, in upcoming versions of VCNS (vCloud Networking and Security), the NSX virtual firewall/router edge device is replacing the old vShield Edge.  With NSX-V, the NSX SDN capabilities integrate directly with the vDS.

In my OpenStack entry I touched on the plans I had for introducing OpenStack into the lab.  Unfortunately, the realities of NSX integration complicates things and have delayed those plans.  Before we move forward I think it is worthwhile to call these out:

  • A mixed hypervisor (vSphere + other) OpenStack environment will require NSX-MH if you want to take advantage of advanced OpenStack SDN constructs (Neutron)
  • If you do not go that path, you need to fall back to static Nova network models.  These map pretty closely to vCloud Director “port group assignment” org networks.  So you have to configure a bunch of VLANs up front and map them to port groups which are then utilized by the OpenStack controller at the compute deployment layer (Nova).
  • VXLAN requires vDS, NSX-MH can’t integrate with vDS, but I Open vSwitch can integrate with VXLAN.  Confused?  Don’t feel bad.  Overlay networking can get confusing fast.  The net out here is that in a vCenter environment, to take advantage of both Neutron and VXLAN, you need essentially parallel networking setups.  NSX-MH will be speaking VXLAN, but doing it’s own thing and not part of an existing vDS VXLAN.
  • NSX-V (native VMware) and NSX-MH (multi-hypervisor) cannot co-exist in a vCenter

For lots of reasons I don’t want to break down my HA/DRS clusters.  I could have potentially played with OpenStack and NSX-MH exclusively in my entirely nested vCenter 2 environment, but the purpose of that one is really SRM so it would complicate things.  I still may go ahead and create a third nested vCenter environment and play with OpenStack and NSX-MH there, but that will have to wait.  For now I decided to move forward with NSX-V and shelf the OpenStack testing.

So back to the implementation detail… NSX is a fairly complex technology with some dependencies that never quite fit my old white box lab setup.  For example you’ll need to have a vDS which means you’ll need to have a cluster and multiple NICs in each host.  This means you’ll need either a pretty complex white box build, or a really good nested setup.  I never quite had the former as I really was focused on building to a rock bottom budget, but these days I am running the latter so the time was right.

NSX has a few core components to be aware of:

  • NSX Manager: The NSX management plane is built by the NSX manager. The NSX manager provides the single point of
    configuration and the REST API entry-points in a vSphere environment for NSX.
  • NSX Controller: The NSX control plane runs in the NSX controller. In a vSphere optimized environment with VDS the controller enables multicast free VXLAN, control plane programming of elements such as VDR. In a multi- hypervisor environment the controller nodes program the vSwitch forwarding plane. In all cases the controller is purely a part of the control plane and does not have any data plane traffic passing through it. The controller nodes are also deployed in a cluster of odd members in order to enable
    high-availability and scale. Any failure of the controller nodes does not impact any data plane traffic.
  • NSX Edge:  NSX Edge offers L2, L3, perimeter firewall, load-balancing and other services such as SSL VPN, DHCP, etc.
  • Hypervisor Integration: The NSX Data plane consists of the NSX vSwitch. The vSwitch in NSX for vSphere is based on the vSphere Distributed Switch (VDS) (or Open vSwitch for non-ESXi hypervisors) with additional components to enable rich services. The add-on NSX components include kernel modules (VIBs) which run within the hypervisor kernel providing services such as distributed routing, distributed firewall and enable VXLAN bridging capabilities.

As you might imagine from the above, the first step in getting started with implementation is to deploy the NSX Manager.  Luckily, as is frequently the case lately, VMware has packaged this as a click through OVA.  Download the OVA and start the OVF Template deployment wizard from the web client as always:

Screenshot 2014-09-01 18.26.50

The NSX Manager OVF package detail…

Screenshot 2014-09-01 18.26.54

Agree if you’re ready to do this:

Screenshot 2014-09-01 18.27.04

Select a deployment location for the VM:

Screenshot 2014-09-01 18.27.11

Select a storage destination for the VM:

Screenshot 2014-09-01 18.27.17

Connect the NSX Manager to a network (admin network generally since this is a management plane component):

Screenshot 2014-09-01 18.28.11

Provide configuration for the appliance – passwords, hostname and IP info for the appliance:

Screenshot 2014-09-01 18.28.52 Screenshot 2014-09-01 18.32.54

Finish off the configuration and the NSX Manager VM will deploy:

Screenshot 2014-09-01 18.35.44

Very easy https connection to the appliance IP and you will see the VAMI login:

Screenshot 2014-09-01 19.47.34

Simple and clean UI.  You can grab the tech support logs here, view hte configuration summary, manage and update the network configuration, upgrade the appliance and, the two most important at this stage, integrate the appliance with vCenter and back it up:

Screenshot 2014-09-01 19.47.39

 

vCenter registration is very straightforward.  Enter the vCenter address and login info, as well as the lookup service.  Configured vCenter registration provided below as reference:

Screenshot 2014-09-16 14.17.51

With this part complete the NSX Manager appliance is configured so you should go ahead and back it up just to be safe.  After this we can head into the web client where we will now see the NSX management solution – Networking & Security.  Clicking on that icon will bring us to the next stage of the configuration, but more on that next entry!

Screenshot 2014-09-16 11.16.36

Car Buying Tips from a (sort of) Veteran

Posted: September 12, 2014 in Cars

Friends of ours recently asked if we could lend a hand in demystifying the car buying process.  One of the (only) advantages to having the insane level of car ADD that I’ve displayed is that, what is normally a confusing and intimidating process for most people, becomes a (reasonably) well understood game.  A lot of it, honestly, comes down to “the art of the deal” (borrowing a line from His Hairness) but even for those of us not blessed with the natural gift of brutal negotiation, you can still do some work ahead of time to put yourself in a better bargaining position.  I decided that, since I’ve had a fair bit of car chatter on the blog, and the topic does come up a bit, it might not be a bad idea to once more take a break from the geekery (especially as of late) and “shift gears” (groan) back to cars for an entry.  Without further ado, some basic guidance… Comments welcome on this one!

buying-a-car-headache

1) TRADE-IN STRATEGIES: To make trade-ins easier (if there is one) you can go here:
This process will lead to an estimate that participating dealers have to honor (unless the vehicle was misrepresented).  Cars can also be sold this way.  They have participating dealer centers that will just buy the car even if you’re not buying something new.
TAKEAWAY: pre-negotiating trade-in value is a lot less painful unless you’re a very good negotiator and prepared to walk (or sell privately).  Remember that in PA trade-in value reduces the sales tax burden of the new car.  So a $30k new car that you applied a $10k trade-in to is taxed as a $20k purchase.
2) GETTING THE BEST DEAL: To get an idea of how much to pay you can go here:
Few things to be aware of… There are 3 components of a car price: dealer invoice, holdback and MSRP.  MSRP is the regular retail price (the “sticker price”).  The goal is to never pay this.  Dealer invoice is what the dealer paid to the manufacturer.  In between the two is their gross margin, but cost of sale comes out of that, so net profit isn’t a straight MSRP-invoice.  That said, it’s often possible to get pretty close to dealer invoice price (assuming you can learn what that number is which is where a site like Truecar helps).  The reason dealers can come so close to “cost” is because of manufacturer holdbacks.  These are basically cash kickbacks to the dealer at some set percentage.  The manufacturer, through holdbacks, is basically helping to finance cost of sale.
TAKEAWAY: the more you know about the dealers cost structure the better you can negotiate.  Special incentive deals are a kind of concession.  They’re good because they’re automatic, but they do make additional negotiation more difficult.  Not impossible though, especially if you know the costs
3) LEASING: some things to keep in mind about leasing, assuming this is of interest… The key components of a lease are:
-Money Factor: this is the interest rate and is negotiable based on credit score.  Incentive leases often have fixed interest rate.
-Cap Cost: the purchase price of the vehicle.  This is where nearly all lessees make their first mistake.  A leased car should absolutely be negotiated the same as a purchase.  As a matter of fact, don’t even bother mentioning how you plan to pay until the price is final.  At that point spring the lease on them.
-Residual Value:  this is the agreed upon value of the car at the end of the lease term and is almost never negotiable.  It is based on a lot of factors and the manufacturers use it to play some games.  The only real buyer influence over this is mileage.  A higher mileage lease will of course carry a lower residual value.  Beyond that, some car companies make the residual artificially low (Audi) to disincentive leasing.  Others make it artificially high (BMW) because their business model is built around leasing.  The difference between the cap cost and the residual value is what you’ll be paying off monthly plus tax and interest.  So in essence, you are paying for the depreciation.  Keep in mind that you are paying for the estimated depreciation and not the real depreciation.  As an example, in the case of BMW, you’ll often enjoy a lower payment than the value of the car would suggest, but at the end, if you keep it, you may very well be paying above market for what the car is worth.
4) POST SALE “EXTRAS”:  After the handshake you’ll always be handed off to “the finance guy”.  This is normally a hard-sell, but these guys are good at reading people so it’s pretty easy to also blow off the entire thing with a close off enough posture.  They are looking to sell extended warranties, protection services, and various types of insurance.  Examples are glass coverage, tire and wheel coverage, ding and dent repair, and comprehensive warranty additions.  These services are all rarely a good deal and I would suggest avoiding them.  Some exceptions are if you plan to keep the car a very long time, or put lots of miles on, and are able to validate that the warranty program they are selling is any good.  Some aftermarket warranty enhancements are ok, most are not.  If they’re offering a good program, then it’s a question of getting it for the right price as dealers seek to make back a lot of lost margin on the car in that room (50% markups are normal for warranties).  Things like tire and wheel, or ding insurance can be OK if priced cheap and if you find yourself normally spending a ton on these types of repairs.
5) GENERAL APPROACH:  when approaching the dealer go armed with facts, but don’t present yourself as armed with facts.  This is a good way to measure how honest the salesperson is and get an overall feel for how genuine the dealership is.  Don’t go in committed to buy unless you truly know your bottom line numbers.  Always be prepared to walk away.  Make sure to get a test drive (amazing how many folks skip this), if for no other reason than to measure the dealers customer service approach.

 


I was catching up on Twitter today and noticed some of my trusted connections retweeting the cloud based system monitoring as a service offering from Boundary.  There are lots of these tools on the market and some, like ZenOSS, are quite excellent.  It’s a developing space, though, and I always love checking out new approaches to classic problems, so I decided to take a look.  I clicked their bit.ly and was greeted with this:

Screenshot 2014-09-11 23.26.31

I see a nice clean UI (check).  I see “free” (double check) and then I see “10 servers and their apps absolutely free” and start getting interested.  They definitely have my attention so I decided that I will go ahead and create a free account (funny how these days vendors have to compete for your attention even with complex software that they’re offering free!).  I had assumed that any service like this is agent based (pretty much has to be if you think about it), and I verify that it is, so no show stoppers in the architecture.  Clicking “Create Free Account” brings up a similarly no fuss dialogue:

Screenshot 2014-09-11 23.14.02

I enter my precious info, click Signup Now, and POOF.  Instantly I am transported to…

Screenshot 2014-09-11 23.14.54

My shiny new (and blank) dashboard.  Now that is “no nonsense”!  Notice how it says “hey, you no data, click for assistance”?  Well, click I did.  The assistance tab keeps the theme going and is extremely direct and contextual.  Adding new servers is right on the surface and it provides you with a CURL CLI example for Linux and a link to grab the Windows agent. It also provides a personalized API key to associate the agent with the service (love this):

Screenshot 2014-09-12 00.36.42

 

I decide that monitoring my vCenter would be neat (since it’s busy all the time), so I choose that as my first server.  I opened an RDP and lobbed the URL in.  It’s a path to a file on S3 (big shocker) which instantly downloads a (very small) agent installer:

Screenshot 2014-09-11 23.19.39

First up we have to associate the agent with our cloud monitoring instance.  This happens right in the installer and is super slick.  Just drop in the aforementioned API key and provide a hostname to identify the server in the dashboard:

Screenshot 2014-09-11 23.20.22

The installer will connect to the Boundary REST API and validate the association:

Screenshot 2014-09-11 23.20.31

It proceeds with a quick file copy:

Screenshot 2014-09-11 23.20.48

And viola!  Extremely fast intsall:

Screenshot 2014-09-11 23.20.54

 

The really nice part?  Seconds after the install completed, the dashboard, still open on my admin console, refreshed and started showing real time data!

Screenshot 2014-09-11 23.21.19

Next I decide that vCenters little blue line looks lonely and not nearly as cool as the dense seismograph Boundary uses in promotional materials!  Plus, what’s life for an admin without some Linux action?  To keep vCenter company, I decide to add my vCD 5.5 instance (now extinct, but still running in my lab).  First up, we SSH into the primary cell and enter the handy provided CURL syntax (note, API required to associate the instance):

Screenshot 2014-09-12 00.35.59

The syntax is good, so CURL goes to work (note, my vCD install is not the OVA appliance and is instead a manual install on CentOS because that is how I roll!).  The install script calls YUM, we’re on CentOS, it’s all good.  Not on CentOS? Then you need to look at the script and possibly roll your own:

Quick as lighting (as quick as Windows actually) and she’s all set.  No issues at all installing on vCD:

And once again like magic, vCenter has company pretty much immediately after the install completes.  Below you can see a red line accompanying the blue, and bringing up detail shows the associated hosts:

Screenshot 2014-09-12 00.38.07

 

And… it wasn’t long until I got my first alert! (hey, it’s a lab!).  Came directly in email without me having done any configuration at all, so a default set of alert thresholds is configured for you automatically.  Nifty!

Screenshot 2014-09-12 00.50.27

I have to say I am extremely impressed with Boundary so far.  I took a look at the tutorial section (screenshot below) and there are a plethora of advanced customization options hiding below the deceptively simply surface.  Make no mistake, despite the gentle onramp this is an extremely powerful tool.  I plan to do a bunch more testing and dive into the advanced bits, but wanted to start off by sharing the “out of box” experience.  Stay tuned!

Screenshot 2014-09-11 23.34.39