Fujitsu StorageCluster Walkthrough Part 1

User Rating: 5 / 5

Star ActiveStar ActiveStar ActiveStar ActiveStar Active
Published: Tuesday, 27 March 2018 08:30

Fujitsu's StorageCluster feature is a very interesting piece of technology. Not only because it offer synchronous mirroring with transparent failover in a native way without the need for additional software like DataCore's SANsymphony, IBMs SVC or FalconStors NSS but rather because it is very versatile. You can mirror between every ETERNUS DX system as long as it is generation S3 or higher and runs firmware V10L20 or higher. For example, you can mirror between a DX100 S4 on the first site and a DX600 S3 on the second site. This is possible because all ETERNUS DX systems share the same firmware and only the hardware it runs on is more ore less powerful. This is a real hardware invest protection. The second very interesting feature is the lack of capcity based licensing. Synchronous mirroring (Fujitsu calls it StorageCluster) is licensed on system level, so once you bought the license it doesn't matter how much storage you put into your system. This is a very important fact in times where capacity grows more and more.

Although the StorageCluster is a nice feature especially in the DACH or EMEA region (we love synchronous mirroring) there is pretty few information about it on the internet. If you ever think about implementing this feature you shouldn't rely on community posts, blog articles or even a documentation from Fujitsu themselfes. Don't ask me why but there is not even an official documentation for this feature on the web as far as I know. You can find some information about SC in some whitepapers and some presentations but a dedicated walkthrough paper you probably won't find. Perhaps the reason is because the product is not that popular and Fujitsu is definetly not that common as systems from HPE or Dell/EMC but I'm pretty sure that is is not because the products are bad. Absolutely not, FTS's storage products are very stable, perhaps not that feature rich, but absolutely reliable, easy to setup and maintain (most of them :-) ) and very flexible.


One of my colleagues went to a 2-day training for the SC implementation and with the information he brought back to us I'm currently installing the first FTS SC installation in my life. 

Time to write down the steps I did to help others getting this little piece of technology work.

First some technical background. SC is not a true active/active storage system as SANsymphony. A single LUN is always active on only one of the two cluster nodes. The other node will only receive updates over the "mirror links" and waits for the active side to stop working. It is still an active/active setup because you can mirror another LUN the opposite way, so storage node 2 will be the active one and node 1 will be passive. This way you can use the ressources of both nodes in the storage environment. The ETERNUS SC is working like a MS failover cluster. A role can only be active on one node and a role in this scenario is a LUN.


13-strsys-fail tcm100-935609


If one of the nodes fails, the secondary node will take over the complete SAN identity of the first node. That means, the WWNs/IQNs of the active controllers frontend port(s) on the first node will be transferred to reserved ports of the secondary system. After transfer is complete, the second node will start its ports with the new identity and application servers can continue to access the storage. For the application server, there is no path failover in a traditional way as the WWN of the controller port will not change.
This is the right time to tell you something about the limitations of the SC. Not everything is good, SC also has some disadvantages you have to keep in mind. First, the before mentioned "switch of identity" takes some time. It is NOT that transparent as SANsymphony is. Moving the identitiy to the other node, informing the application server about the move and force the MPIO on the server to switch takes some time.  This time can be up to 10 seconds. Your application has to tolerate this I/O disruption. Normally the hypervisor or the OS will tolerate I/O disruption up to 60 seconds but there are also applications and filesystems that won't do that. Older Linux systems can switch to read-only mode and Oracle databases (the most I/O sensitive application I've ever seen) will eventually drop. Even SQL servers are not very tolerant but should normally only write warnings and continue to work. You should test your application before running it on SC. The before mentioned 10 second delay for switch-over is the absolute maximum, normally failover is faster so you won't have any problems except a short hang of the accessing systems.
Second thing to mention is the difference in switch-over time between FC and iSCSI. SC supports both but only FC can keep the switching time below thresholds of applications and filesystems. In iSCSI environments switch-over time can be between 30 seconds and up to 120 seconds which is way too much for nearly all applications. This is worst case scenario but I would only recommend SC in FC environments. Byt the way, this is not only a problem with FTS' SC but rather a general problem with iSCSI based failover clusters. The reason is withinin the iSCSI protocol and the retry times of TCP/IP so faster switch over is impossible due to the "security" features of TCP/IP.

As one controller port can only have a single "identity", you have to provide a corresponding port on node two for each port on node one. So the minimum number of ports each ETERNUS DX system needs for SC is 6 or 3 for each controller. One port on each controller for frontend traffic, one for replication and one as corresponding port for the frontend port of the second node. Depending on the ETERNUS model you can add additional FC or iSCSI ports to the controller. Standard systems like the DX100 or DX200 provide up to 4 FC ports per controller (8 for the whole system). Larger systems like the DX500 or 600 support up to 32 ports per system. So choosing the right model is also a matter of how many ports you need.

The synchronous mirroring will also have some requirements on the storage network infrastructure. First there is bandwidth: 50MBit/s is the absolute minimum SC needs for operation. In normal scenarios this is ridiculous low so better plan with 2-4GBit and more. It really depends on the performance of your storage system and the write ratio. In times of All-Flash storage even 8GBit an more could be a bottleneck. Keep in mind that only writes will transfer over the REC ports but as SC ist active/passive, at least half of your systems will also have to access the storage over the ISL. Please plan accordingly.
The second requirement is latency. SC supports up to 10ms round-trip time (RTT) but faster is always better. Don't spent hundreds of thousands of € on your all-flash storage and loose all performance on the links between your highly redundant sites. 3ms or lower is a best-practice here. 

That's for the hardware requirements, let's look at the software requirements and licenses. SC is a per-device license so you have to buy the custer pack for each node, no matter which model, controller size, generation or capacity. This pack includes the SC license as well as a license for ETERNUS SF StorageCruiser Standard. This is the central management application that is required for the SC to be configured. 
Another piece of software needed is the SF Agent. This is a tiny Windows or Linux software that renders a VM or physical server into a storage cluster whitness or Storage Cluster Controller. SC is cluster technology and to avoid split-brain scenarios you need a quorum instance. SF agent will play that role. So that's it. Quite simple, isn't it? If you thin about implementing this technology, draw up a budget of ~20.000-30.000€ for both nodes and a 3 year support agreement.

 That's all for part 1 in my ETERNUS StorageCluster walkthrough. Part 2 will show you how to implement the solution.