image1 image1 image1

Wide Area SAN

A few months ago one of our customers decided to evolve its single-datacenter-concept into a two-datacanter-in-two-different-locations-concept. The second datacenter is located within a 20km range of the first DC and all systems should be evenly spread across the two locations, forming an active-active solution.

Since the customer already uses server (MS Hyper-V and VMware) and storage (DataCore) virtualization this shouldn't be a problem. He ordered 4x4Gb FibreChannel connections for the SAN and a bunch of 1/10GbE lines for the LAN. To reduce the physical lines needed the service provider uses Huawai OSN 1800 multiplexer devices that bundles several lower bandwidth connections within a single 10Gb stream. The OSN multiplexer are active components installed in every datacenter that transparently multiplex and demultiplex the streams, so the devices will never show up anywhere in the data path. Between the multiplexers the provider will use WDM (Wavelength Division Multiplex) and DWDM (Dense Wavelength Division Multiplex) to aggregate the traffic sent over the lines. 

After initial setup of the multiplexer and the lines between them we attached our Brocade SAN and Cisco LAN equipment to them to start testing the lines. LAN was no problem as the Nexus systems the customer uses do quite well within such an environment. Things change when it comes to the SAN equipment. One thing to mention first, the problems we encountered were not caused by the Brocade hardware itself so I still would always recommend Brocade stuff for FC SAN.

The first thing to keep an eye on is that service providers do check their lines after initial deployment. The problem is, they never include more than their own equipment in the tests. In our case, they tested the line from multiplexer to multiplexer, ignoring the fact that behind each multiplexer there will be another FC cable, another SFP-module and, of course, the SAN switch. All of these components can cause problems and have to be checked against the multiplexer and the SFPs built into this device.

Second thing to mention is the multiplexer used. In our case the OSN1800 is qualified with Brocade SAN switches running FOS 6.x and above so no problem here. Although the multiplexer is an active component and your SAN switch will be directly attached to the device, the multiplexer isn't an endpoint, so looking at the distance the two SAN switches have to take care about is the whole 20km and not only the 5m from the switch to the OSN. 20km distances, better said every distance higher than 10km) in Brocade environments need the Extended Fabric License because you have to configure additional buffer credits for the ISL ports and this is only possible with this license. After installing the license change your e-ports from L0 mode to LS mode and enter the desired distance. The switch will calculate the required buffers itself. It is best practice to double or even triple the desired distance to add more buffer credits as the calculation sometimes undercharges and having to few buffer credits will slow down your connection. The reason behind this is that every frame uses one buffer credit by default. The credit will be taken back to the pool as soon as the frame reaches the destination. If you have no more credits left the originating switch will not send any packets until he gets some free credits back. In long-distance setups each frame traversing the WAN needs some time to reach the second switch and this can cause the buffers to fill up and the data stream to starve.

Regarding the SFPs you have to use, things change. For the SFPs the distance between the switch and the multiplexer is important so you don't have to use long wave SFPs with 1550nm that can send over distances up to 25km but rather use SFPs that only cover the distance between switch and multiplexer. In our case the service provider recommended single mode SFPs to have a uniform setup but theoretically you could also use multimode. 1550nm SFPs are only needed if you have dark fiber lines directly connecting your switches.

Okay, that's regarding the hardware itself. So make sure your SAN switch is compatible, correctly licensed and euipped with the correct SFP modules.

We got all that in mind and started running tests an both fabrics (two switches in each fabric using redundant pathes for the ISLs). Fabric1 was performing quite well while fabric2 was not. Fab2 suffered real slow performance, connection aborts, loss of sync etc.

We checked everything, called the service provider to retest the connections, forced him to include our hardware as far as possible. That was a good starting point as he took the SFPs from our SAN switch and put them into their test devices. Thsi way it was possible to test if the SFPs are working well with the SFPs in their multiplexers. No errors here. Then we used the same cables from SFP to multiplexer as we would do just to see if the problem was related to the cables. Well, the cables had a quite high attenuation (transmission loss) so the cables had to be cleaned. This is something I learned from the experts. Even brand new cables probably are dirty and have a higher attenuation than desired. This can cause the light from the laser to be scattered or minimized so the connection will be unstable. The provider cleaned all cables and, just to be sure, the SFPs as well. Things got a bit better but still slow performance.

As the service provider can not test switch-to-switch communication we used the Brocade tool spinfab. Spinfab is a native tool on each Brocade SAN switch that checks communication between the switches by sending special frames over e-ports. The receiving switch identifies these "test" frames and routes them back to the sending switch. This way the sending switch can see if there is frame loss or any other error. Spinfab can also be used to test bandwidth as it will send as much frames as the ISL will accept. More notes on bandwidth checks with spinfab a bit later.

Spinfab is easy to use as it has only few parameter. First you can define which e-ports spinfab should test. You can do this with the parameter -ports x,y,z... Second parameter is the frame size used. Standard frame size is ~2000 bytes, maximum size is 2112bytes, minimum size is ~100 bytes. The lower the frame size the more credits will be used for long distance ISLs and the more the performance will suffer. If you omit this parameter spinfab will use the default size of ~2000 bytes. Last parameter is -nmegs. This parameter will tell spinfab how many frames will be sent until the test is finished. Default value is 10 but this dowsn't mean 10 frames but rather 10 MILLION frames. Even on fast ISLs this will cause the test to run several minutes. You can change it to 2-5 million if you want a faster check.

As a preparation for the spinfab test, please reset all error counters on each e-port you want to test. As spinfab will only ADD errors to the counters but will never reset them, it is possible that the test stops quite fast after starting just because it thinks there are already too much errors. Simply send a "portstatsclear portnumber" before you fire up the test and everything is fine.

The check itself is rather unimpressive. It will print a disclaimer that you shouldn't stop the test in the middle as it could cause your switch to get unresponsive. I canceled several of these tests and most of the time the switch was normally operating but one time the switch extremely slowed down so take care here.

If the check founds no errors it will print a "test successful" message with only a few more information. If the test finds errors these errors will be printed one after another on the CLI. If you see several thousands of errors (like me when I first run this tool) wait until the test aborts and check what error counter raised with every step. Do not care about the last reason mentioned why this tests aborted. Most of the time spinfab will tell you that the sender stopped to transmit or something like that. Check the counters on top and look at the high numbers shonw here. In our case we had plenty of enc_crc errors and that caused the test to abort.

Before I will tell you the reason for the errors one note on the bandwidth tests. If you fire up spinfab you can go to the second switch, open a CLI to it and start a "portperfshow portnumber" on the receiving e-port. This will tell you how much data is received. Please be careful here: in our environment using multiplexer and WDM between, the performance was ALWAYS, no matter what you do, statically at 8,8MB/s or 10,0MB/s in the other direction. If you connect two switches in the same location without WDM between performance will be much higher. In my tests I could easily see 800MB/s on a 8GBit switch. So don't rely on the performance shown with portperfshow when using spinfab in a long distance environment. Always use a second method to test performance. E.g. you can use a standard server and present a LUN from a storage system "on the other side". This will cause the data to travel through the ISL and you can see in Windows which performance you actually get.

Back to the CRC errors. CRC normally means some defect part used, either SFP or cable or switch or whatever. This is not neccessarily true. CRC errors can also be caused by too much or too less light being send or received by the SFPs. In our special case, the CRC errors occured because the SFPs in the switch sent with too much power and the SFP in the multiplexer had problems with that. So we had to include some attenuation modules that reduce the light. Attenuation is measured in dBm and a good vaule her is -7dBm down to -19dBm. After reducing the power all CRC errors were gone and performance was perfect.

Hope this information helps everyone setup a "Wide Area SAN".

Leave your comments

Post comment as a guest

Your comments are subjected to administrator's moderation.
  • No comments found
Powered by Komento
joomla templatesfree joomla templatestemplate joomla
2017   globbers joomla template