Content
Virtual Server Mobility Issues in the Data Center Environment
by Vladimir Stajić, CCSI at NIL Data Communications Ltd
Introduction: Server Virtualization
Server virtualization is something that everyone seems to be using in the data center (DC) these days. Simply put: we all want to save money, do more with less, utilize our resources better, go green and whatever else might be the catchphrase of the week. Server virtualization helps us to achieve some of these goals by allowing us to share hardware resources of a single physical server (host) among multiple virtualized servers (guests). We can utilize the hardware resources better while keeping the required separation. We also gain the ability to move the guest in case of host failure, a need to increase resource usage or for other reasons.
But the IT environment was not historically designed with such abstractions in mind. The usual setup is one-to-one; that is, one server to one operating system (OS), or one server to one port on a switch. Furthermore, the usual mantra is that “stable = good; change ≠ stable,” and change is much more common in virtual environments than in the classic data center.
Multiple areas of guest mobility may cause issues, but for this article I will focus on problems related to the Storage Area Network (SAN).
Background: Fibre Channel Addressing
While modern Fibre Channel (FC) implementations utilize the Switched Fabric (FC-SW) environment, the name itself is slightly misleading. The way that traffic in FC-SW gets moved around is actually much closer to routing in the TCP/IP environment than it is to switching.
World Wide Names
World Wide Names (WWNs) are a Fibre Channel equivalent of MAC addresses in Ethernet environments. These eight-byte hexadecimal strings are typically hard-coded into physical FC devices such as the Host Bus Adapter (HBA). The following is an example of a WWN:
21:00:00:e0:8b:05:40:29
A particular FC device might have multiple types of WWNs, but the two most common are the Node WWN (nWWN) and the Port WWN (pWWN).
Node World Wide Name: Each FC device has one nWWN. This WWN is used as a reference point for the device as a whole. For example, a storage array will have a single nWWN.
Port World Wide Name: Each device will have as many pWWNs as it has Fibre Channel ports. For example, the storage array just mentioned might have anywhere from two to several hundred ports, each of which has its own pWWN.
Normally, the relationship between the nWWN and pWWNs is incremental, as shown in the following figure.

Most of the time, end devices (nodes) in the FC environment identify each other by their WWNs, and they establish communications accordingly.
Fibre Channel Identifiers
If the WWN is equivalent to the MAC, then the Fibre Channel Identifier (FCID) is equivalent to the IP address. FC switches use FCIDs to route traffic from one node to another.
The FCID is divided into three eight-bit fields: Domain, Area and Port. For purposes of this explanation, we will treat the Area and Port fields as a single entity whose value is randomly assigned.

The FCID is assigned to a node when that node connects to the FC switch. The process is called Fabric Login (FLOGI); among other things, it results in the node having its WWN associated with the assigned FCID. This association is stored in the Name Server component of an FC switch. All devices that connect to the same switch will have an identical Domain ID part of the FCID. Only the Area and Port values will differ.
The Domain ID value has to be unique on every switch in the fabric, as it’s used as a routing reference.

These days, most of the switches that are implemented are capable of FCID persistence, which means that the node will retain its FCID even through FC switch reboots, or even if it moves between the ports on the same switch.
N_Port ID Virtualization (NPIV)
Virtual Server Environment Challenges
Virtual server environments have a problem that stems from a simple fact: a switch expects a single FLOGI through a port on which a node is connected. Since we now have a host and one or more guests that connect through that host the switch would get confused when multiple FCID requests are received through the same port.

In a classic environment, the switch would accept the first request for an FCID but would reject any subsequent request. While this approach would still allow for mapping of SAN-attached storage to guests, it would have to be done on the hypervisor level, which would introduce another level of configuration tracking and reduce the flexibility of the guest, as far as mobility between the hosts is concerned.
NPIV Solution
NPIV is a T11 standard. It allows multiple requests for an FCID through the same F_port (the port through which a node connects) on an FC switch.
The first interaction is a standard FLOGI process. Subsequent logins are handled as Discover Fabric Service Parameters (FDISC) requests, which in this context serve as FLOGI requests for any and all guests.

The end result is that not only the host, but also the guests receive their FCIDs. The entries in the Name Server table show all the nodes originating on the same switch port.
|
INTERFACE |
FCID |
PORT NAME |
|
fc1/5 |
0xeb0200 |
21:00:00:e0:20:00:00:e0 |
|
fc1/5 |
0xeb0401 |
21:00:00:e0:20:00:10:b1 |
|
fc1/5 |
0xeb52f2 |
21:00:00:e0:20:00:04:13 |
|
fc1/5 |
0xeb0ac3 |
21:00:00:e0:20:00:53:00 |
Since each guest logs in with its own WWN, it is also possible to put the guests into separate zones, if that is required in the topology. The zoning configuration will follow the guest seamlessly as it moves between the hosts.
N-Port Virtualization (NPV)
Virtual Server Environment Challenges
The NPIV functionality solves the problems related to multiple guests connecting through the same F_port, as well as part of the mobility problem that relates to zoning and WWNs. However, it doesn’t address what happens if a guest moves between hosts that are connected to different FC switches, and FCID is an important parameter in communication.
While explaining the FCID principles, I mentioned that one of the FCID fields is directly tied to the Domain ID parameter of the FC switch to which a particular node is connected. This implies that, if a node moves from one FC switch to another, it cannot under any circumstances retain its FCID, because the Domain ID value will invariably change.
Under most circumstances this is not a big issue; however, some FC applications may use FCID value as a configuration variable. In that case, if the guest moves to another FC switch, the FCID value will change, and the application may lose connectivity.
NPV Solution
As opposed to NPIV, NPV is not an industry standard; it’s Cisco functionality. However, it is compatible with any FC switch that supports NPIV.
NPV more or less turns a compatible FC switch into a proxy device. The NPV-enabled (edge) switch does not process any FLOGIs on its own, but rather passes them along to an upstream (core) switch. As far as the core switch is concerned, the edge switch seems like a server – a host that has multiple guests connecting. The core switch needs to support the NPIV functionality.

Since the NPV switch now acts a proxy/forwarding device, it no longer provides FCIDs to nodes that would be associated with its own Domain ID. In fact, it does not even have its own Domain ID.
Now we can have the following scenario:
We have a large virtualized DC in which we have multiple hosts connected to different edge switches. Those edge switches are all connected to the same aggregate core switch. If guest servers migrate between host servers that are connected to different edge switches, all the FC parameters remain unchanged — zoning configuration as well as FCID assignment.

Summary
Virtual environments are a fact these days. It is also a fact that they can exercise their full potential only if they really are as virtual as possible, and not limited by restrictions imposed on them by legacy physical infrastructures. NPIV and NPV functionalities help with getting around some of those restrictions. At the same time, they allow us to keep our configurations as static as possible, while providing any mobility that may be required. Stable = good, remember?
