620 Alden Rd
Suite 104 Markham, ON  L3R 9R7
Phone: 905-513-8866
Fax: 905-470-6019

Toll Free: 1-800-263-1794

 
 
04/29/2010

Nexus Observations

Nexus Observations

I recently had the opportunity to work on a large data centre network design.  The design called for a large number of servers to be connected at 10Gbps.  We initially looked at using 6500 series switches, but quickly shifted to the Nexus 7000 series devices because the per-port cost of 10Gbps interfaces is so much better.

Having decided on the Nexus 7000 as both the network backbone and 10Gigabit Ethernet access, we started thinking about how to take advantage of the many unique features of the platform in this network.  Here are some of my thoughts and observations on several of these features.

 

1. VRF by default

At first I was somewhat resistant to the way Cisco forces the use of multiple local Virtual Route Forwarding (VRF) instances in the Nexus.  Out of the box there are two VRFs.  One VRF is for management and the other "default" VRF is for everything else.  I quickly realized that this is actually a huge benefit.  Segregating the network management access to a separate VRF, rather than the traditional separate VLAN, vastly improves the security of the system.  It also encourages the use of multiple VRFs in the network design.

My one caution about this is that it's very easy to get carried away with creating a VRF for everything.  This is not a good idea.  Remember that it is very difficult to route packets between VRFs.  Your only options for doing so in the standard enterprise software are policy routing and running a physical cable loop between two interfaces on the switch. 

Neither of these is particularly desirable, so I recommend keeping the number of VRFs to a minimum.  You will certainly want to use a different VRF for each distinct "customer" connected to the switch in provider type networks.  Within a single customer network it can also sometimes be useful spin off separate VRFs to hold different security zones, while still connecting them to the same physical switch.  This may correspond to different application groups, for example.

It is rarely useful to split a single application group into multiple VRFs, though.  This seems obvious in the abstract, but in practice it's very easy to get carried away with multiple VRF mayhem to the detriment of a simple rational design.

 

2. Virtual Device Contexts

The Virtual Device Context (VDC) is a concept that will be familiar to anybody who has done a lot of work with Cisco ASA firewalls.  The VDC feature allows the switch to be logically partitioned into multiple logical devices that manage as separate entities. 

This is different from the VRF feature.  VRF segregation allows different interfaces or VLANs to belong to separate Layer 3 routing tables.  They are still logically managed together.  There is only one set of VLAN tags, for example.  So it is easy to put the VLANs from different VRFs on a single trunk.  However, with VDCs the information is separated at all layers.  It is not possible to trunk together the VLANs from VDC number 1 with those from VDC number 2.  The switch is broken up into multiple logical switches at Layer 1.

Initially we wanted to use this feature to create separate logical "access" and "core" switches while still taking advantage of the great cost advantages per port for 10Gbps Ethernet interfaces.  Unfortunately, Cisco has priced the licensing of this feature in a rather crazy way that makes it cheaper to buy additional hardware.

 

3. Virtual Port Channels

If the cost per port is the feature that gets the Nexus platform into data centres, the Virtual Port Channel (VPC) feature will keep them there.  This is a multi-chassis Etherchannel technology.  We are already familiar with multi-chassis Etherchannel from the VSS feature on the 6500 series switches, but this is different.  VSS resolves the multi-chassis control problem by interconnecting the control planes of the two switches and using the supervisor module in one chassis to control all of the modules in both chasses.

VPC, on the other hand, maintains the separate control in each switch, but elects one of the switches to act as the channel controller for all Etherchannel bundles. 

We used VPC in this particular network to essentially do away with Spanning Tree.  There was still an STP instance configured as a backup in case the VPC functionality failed for some reason (actually, we used RSTP rather than legacy 802.1D STP for obvious reasons).  However, with VPC we were able to make all of the redundant links active members of Etherchannel bundles.  This improves network throughput, as well as failover times.  The standard LACP failover time to eliminate a failed link from a channel bundle is 200ms, which is an order of magnitude better than typical Rapid Spanning Tree failover times.

 

4. Fault Tolerance

The Nexus 7000 platform has several extremely clever fault tolerant elements to its hardware design.  It has the ability to use multiple supervisor modules and multiple power supplies, features that we are already familiar with from other large chassis switches.  Furthermore, all modules and power supplies are hot swappable.

In addition to these familiar elements of hardware redundancy, the Nexus 7000 provides for hot swappable and redundant backplane fabric modules.  This means that even a backplane failure isn't service impacting.

We looked at the supervisor failover in some detail, because it's a relatively common concern.  The Nexus supervisor modules include several management options that can and probably should be used together.  You can manage the device in-band through the pre-configured "management" VRF, which is associated with a Gigabit Ethernet port on the front of each supervisor module.  There is also a standard RS-232 serial console connection.  The interesting surprise is the inclusion of a separate "CMP" Ethernet interface, which connects to a separate internal terminal server inside the supervisor module.  The CMP module makes it possible to make an in-band TELNET or SSH connection to the supervisor module, even though it might have crashed and be off-line.  From there it is possible to reboot the module or attach to the console port for maintenance.

The only issue that is slightly troubling about the supervisor redundancy model is that the management Ethernet interface on the backup supervisor module is left in a "down" state.  This interface is only activated when the supervisor modules fail over from primary to backup, which then leaves the original management interface down.  The problem with this is that it complicates monitoring of the management port status.  Is the backup supervisor module broken, unpatched, or just off-line?  It also means that, during a supervisor fail-over, or a software image upgrade, the operator will lose connection to the device and will need to manually reconnect.  I consider this to be a cosmetic issue.

Each media module in the Nexus 7000 is a separate Linux system.  They all have their own routing processes as well as Layer 2 and Layer 3 forwarding tables.  So generally a supervisor failure will not result in any loss of connectivity for existing flows.

 

5. Storage Switch

The Nexus platform comes from previous generations of Cisco storage switches.  As such, there is a lot of storage-specific functionality in the Nexus, and much of the documentation dwells heavily on these features.  In this particular deployment, we used the Nexus only as a Layer 2/3 LAN switching core, so it is difficult to comment on its functionality as a storage switch.  However, my personal bias is that Fibre Channel over Ethernet will not be a really attractive technology until I can fully merge my data and storage Ethernet networks. 

I wouldn't object to having to use separate sets of data and storage trunk links between switches, but I don't like the idea of using a large expensive LAN core switch as a replacement for a cheap SAN access switch.  Unfortunately, this issue won't be resolved until several of the congestion and packet loss problems with the current generation of Ethernet switching are resolved.  Cisco's many Data Center Ethernet initiatives are intended to address these problems, but this is still a work in progress.

 

6. Other Observations

A big question for most network managers is how familiar the management interface will be.  This is really not a large problem.  I found the NX-OS CLI and management to be marginally different from IOS.  In many cases NX-OS appears to borrow some of the more useful configuration concepts from the newer ASA firewall versions, which is very positive.  There are differences, but I found them to be relatively small.  If you know your way around both IOS and ASA CLI's, the learning curve for NX-OS is not steep.

Another feature that I liked about the Nexus 7000 hardware design is the cable management.  Unless you go out of your way to do something different, the cables from every module route straight upward, through a sorting grid, and out the sides.  This makes it relatively simple to avoid the terrible problems of the 6500 and 4500 platforms where the most logical cable routing often made it difficult to swap out fan modules or power supplies.

While I very much liked the cable routing, I was less keen on the flimsy optional plastic doors.  These are of extremely limited value and I don't recommend including them in the purchase of any new hardware.

Finally, one big surprise was the lack of proper NTP server functionality.  The Nexus 7000 is a Layer 3 core switch, which is a very logical place to centralize NTP services.  The core switch will synchronize to one or more Stratum 1 NTP servers, and allow all internal switches and routers, and even servers to synchronize to it.  This model could be particularly useful in situations with multiple VRFs.  Servers in one VRF might not be able to connect to systems in a different VRF at all.  So making NTP services available on the Nexus switch, which touches all VRFs would be an extremely useful design.

Unfortunately, the Nexus 7000 is only able to act as an NTP client.  This is particularly strange because it is built around Linux operating system kernels on each module.  NTP server functionality is readily available for Linux, so Cisco really shouldn't have needed to do any additional work to make it available on the Nexus.

 

7. Conclusions

The Nexus 7000 is an excellent switch, and probably the most cost effective Cisco device for aggregating large numbers of 10Gigabit Ethernet ports.  The architecture is different in many ways from what we're used to from the existing 6500 and 4500 series chassis switches, but I think it's a distinct improvement.