Saturday, May 2, 2009

VMware VI3 iSCSI with multiple non-HBA NICs

While researching iSCSI on VI3, I came across some interesting information when using the ESX iSCSI software initiator that would be applicable to many installations, highlighting a potential bottleneck.

The short version is that if you’re using the iSCSI software initiator connecting to a single iSCSI target, multiple uplinks in an ESX network team for the VMKernel iSCSI port would not be used for load balancing.

This can be easily proven by connecting to the service console and running esxtop (n) to view the traffic for individual network adapters. Assuming your storage is in use, one or more physical uplinks for the vSwitch handling iSCSI should be showing traffic. You can also use resxtop through the RCLI on ESXi.

Why this happens

My understanding is that current ESX software initiated iSCSI connections have a 1:1 relationship between NIC and iSCSI targets. An iSCSI target in this sense is a connection to the IP-based SAN storage, not LUN targets. This limitation applies when the SAN presents a single IP address for connectivity.

VI3 software initiated iSCSI doesn’t support multipathing, which within ESX leaves only load balancing the physical uplinks in a team. Unfortunately, that leaves load balancing up to the vSwitch load balancing policy exceptions. I don’t believe any of the three choices fit most scenarios when connectivity to the iSCSI is through a single MAC/IP:

  • Route based on the originating virtual switch port ID, based on virtual port ID, of which there is only one VMKernel iSCSI port
  • Route based on source MAC hash, based on source MAC, of which there is only one
  • Route based on IP hash, based on layer 3 source-destination IP pair, of which there is only one (VMKernel -> iSCSI virtual address). I don’t think this is a generally recommended load balancing approach anyway

Link aggregation

The VI3 SAN Deploy guide does state that one connection is established to each target. This seems to indicate one connection per LUN target, but the paragraph starts with software iSCSI and switches half way through to discuss iSCSI HBA’s.

I’m still unsure of whether software iSCSI has multiple TCP sessions, one per target (I don’t believe this is the case). The blog referenced below also talks about 802.3 link aggregation which states the ESX 3.x software initiator does not support multiple TCP sessions.

However, if multiple TCP sessions were being established for the iSCSI software initiator to a single target IP address, this opens the possibility of link aggregation at the physical switch. When using 802.3ad LACP in this IP-IP scenario, the switches would have to distribute connections based on the hash of TCP source/destination ports, rather than just IP/MAC.

The following excerpt from the SAN deploy guide:

Software iSCSI initiators establish only one connection to each target.

Therefore, storage systems with a single target that contains multiple LUNs have all LUN traffic routed through that one connection. In a system that has two targets, with one LUN each, two connections are established between the ESX host and the two available volumes. For example, when aggregating storage traffic from multiple connections on an ESX host equipped with multiple iSCSI HBAs, traffic for one target can be set to a specific HBA, while traffic for another target uses a different HBA. For more information, see the “Multipathing” section of the iSCSI SAN Configuration Guide. Currently, VMware ESX provides active/passive multipath capability. NIC teaming paths do not appear as multiple paths to storage in ESX host configuration displays, however. NIC teaming is handled entirely by the network layer and must be configured and monitored separately from ESX SCSI storage multipath configuration.


Excerpts from the following blog, indicate that changes in vSphere for software iSCSI to support multiple iSCSI sessions, allowing multipathing or link aggregation, which would allow separate iSCSI TCP sessions to be spread across more than one NICs (depending on how many iSCSI sessions).

The current experience discussed above (all traffic across one NIC per ESX host):

VMware can’t be accused of being unclear about this. Directly in the iSCSI SAN Configuration Guide: ESX Server‐based iSCSI initiators establish only one connection to each target. This means storage systems with a single target containing multiple LUNs have all LUN traffic on that one connection, but in general, in my experience, this is relatively unknown.

This usually means that customers find that for a single iSCSI target (and however many LUNs that may be behind that target – 1 or more), they can’t drive more than 120-160MBps. This shouldn’t make anyone conclude that iSCSI is not a good choice or that 160MBps is a show-stopper. For perspective I was with a VERY big customer recently (more than 4000 VMs on Thursday and Friday two weeks ago) and their comment was that for their case (admittedly light I/O use from each VM) this was working well. Requirements differ for every customer.

The changes in vSphere:

Now, this behavior will be changing in the next major VMware release. Among other improvements, the iSCSI initiator will be able to use multiple iSCSI sessions (hence multiple TCP connections). Looking at our diagram, this corresponds with “multiple purple pipes”for a single target. It won’t support MC/S or “multiple orange pipes per each purple pipe” – but in general this is not a big deal (large scale use of MC/S has shown a marginal higher efficiency than MPIO at very high end 10GbE configurations) .

Multiple iSCSI sessions will mean multiple “on-ramps” for MPIO (and multiple “conversations” for Link Aggregation). The next version also brings core multipathing improvements in the vStorage initiative (improving all block storage): NMP round robin, ALUA support, and EMC PowerPath for VMware which integrates into the MPIO framework and further improves multipathing. In the spirit of this post, EMC is working to make PowerPath for VMware as heterogeneous as we can.

Together – multiple iSCSI sessions per iSCSI target and improved multipathing means aggregate throughput for a single iSCSI target above that 160MBps mark in the next VMware release, as people are playing with now. Obviously we’ll do a follow up post.

Wayne's World of IT (WWoIT), Copyright 2009 Wayne Martin.

No comments:

All Posts

printQueue AD objects for 2003 ClusterVirtualCenter Physical to VirtualVirtual 2003 MSCS Cluster in ESX VI3
Finding duplicate DNS recordsCommand-line automation – Echo and macrosCommand-line automation – set
Command-line automation - errorlevels and ifCommand-line automation - find and findstrBuilding blocks of command-line automation - FOR
Useful PowerShell command-line operationsMSCS 2003 Cluster Virtual Server ComponentsServer-side process for simple file access
OpsMgr 2007 performance script - VMware datastores...Enumerating URLs in Internet ExplorerNTLM Trusts between 2003 and NT4
2003 Servers with Hibernation enabledReading Shortcuts with PowerShell and VBSModifying DLL Resources
Automatically mapping printersSimple string encryption with PowerShellUseful NTFS and security command-line operations
Useful Windows Printer command-line operationsUseful Windows MSCS Cluster command-line operation...Useful VMware ESX and VC command-line operations
Useful general command-line operationsUseful DNS, DHCP and WINS command-line operationsUseful Active Directory command-line operations
Useful command-linesCreating secedit templates with PowerShellFixing Permissions with NTFS intra-volume moves
Converting filetime with vbs and PowerShellDifference between bat and cmdReplica Domain for Authentication
Troubleshooting Windows PrintingRenaming a user account in ADOpsMgr 2007 Reports - Sorting, Filtering, Charting...
WMIC XSL CSV output formattingEnumerating File Server ResourcesWMIC Custom Alias and Format
AD site discoveryPassing Parameters between OpsMgr and SSRSAnalyzing Windows Kernel Dumps
Process list with command-line argumentsOpsMgr 2007 Customized Reporting - SQL QueriesPreventing accidental NTFS data moves
FSRM and NTFS Quotas in 2003 R2PowerShell Deleting NTFS Alternate Data StreamsNTFS links - reparse, symbolic, hard, junction
IE Warnings when files are executedPowerShell Low-level keyboard hookCross-forest authentication and GP processing
Deleting Invalid SMS 2003 Distribution PointsCross-forest authentication and site synchronizati...Determining AD attribute replication
AD Security vs Distribution GroupsTroubleshooting cross-forest trust secure channels...RIS cross-domain access
Large SMS Web Reports return Error 500Troubleshooting SMS 2003 MP and SLPRemotely determine physical memory
VMware SDK with PowershellSpinning Excel Pie ChartPoke-Info PowerShell script
Reading web content with PowerShellAutomated Cluster File Security and PurgingManaging printers at the command-line
File System Filters and minifiltersOpsMgr 2007 SSRS Reports using SQL 2005 XMLAccess Based Enumeration in 2003 and MSCS
Find VM snapshots in ESX/VCComparing MSCS/VMware/DFS File & PrintModifying Exchange mailbox permissions
Nested 'for /f' catch-allPowerShell FindFirstFileW bypassing MAX_PATHRunning PowerSell Scripts from ASP.Net
Binary <-> Hex String files with PowershellOpsMgr 2007 Current Performance InstancesImpersonating a user without passwords
Running a process in the secure winlogon desktopShadow an XP Terminal Services sessionFind where a user is logged on from
Active Directory _msdcs DNS zonesUnlocking XP/2003 without passwords2003 Cluster-enabled scheduled tasks
Purging aged files from the filesystemFinding customised ADM templates in ADDomain local security groups for cross-forest secu...
Account Management eventlog auditingVMware cluster/Virtual Center StatisticsRunning scheduled tasks as a non-administrator
Audit Windows 2003 print server usageActive Directory DiagnosticsViewing NTFS information with nfi and diskedit
Performance Tuning for 2003 File ServersChecking ESX/VC VMs for snapshotsShowing non-persistent devices in device manager
Implementing an MSCS 2003 server clusterFinding users on a subnetWMI filter for subnet filtered Group Policy
Testing DNS records for scavengingRefreshing Computer Account AD Group MembershipTesting Network Ports from Windows
Using Recovery Console with RISPAE Boot.ini Switch for DEP or 4GB+ memoryUsing 32-bit COM objects on x64 platforms
Active Directory Organizational Unit (OU) DesignTroubleshooting computer accounts in an Active Dir...260+ character MAX_PATH limitations in filenames
Create or modify a security template for NTFS perm...Find where a user is connecting from through WMISDDL syntax in secedit security templates

About Me

I’ve worked in IT for over 13 years, and I know just about enough to realise that I don’t know very much.