Welcome back to part 2 of my ETERNUS StorageCluster Walkthrough. Part 1 covered some basic topics. Part 2 will be a technical part where I wil show you how to implement the solution.
For better understanding I will do that in a kind of step-by-step guide but not in very great detail. SC isn't extremely easy and you should have a solid understanding of base technologies in the ETERNUS storage systems. So please don't try to implement SC as a complete newbie.
Step 1: software installation
The software installation is quite simple. If you already have ETERNUS storage systems you probably have SF Cruiser already installed. The installation itself is always the same no matter if you have Express, Basic or Standard Edition of SF Cruiser. Simply take a Windows Server host (2008R2 or higher, I would recommend using Server 2012 R2 or 2016), join it to an AD (or leave it standalone, it's only important for the accounts you can use to logon to the application), put in the installation media (you can get latest version from http://www.fujitsu.com/global/support/products/computing/storage/download/esf-mediapack-trial.html) and start the installation. Take care about the groups that must exist on the system for authentication reasons.
After setup is done, connect to SF Cruiser (https://name-of-your-sf-server:9855), login with administrative credentials (see group membership) and start adding your ETERNUS systems
Step 2: adding storage systems to SF Cruiser
Second step is to add both storage systems to SF Cruiser. First thing to do is to prepare the storage. Login to the storage system, enable SNMP and allow the SF Cruiser server to be the manager (this allows the Windows system to connect to the storage system with the provided SNMP community). Additionally, create a new user with a strong password and give him the "software" role.
After you've done the preparation steps you can add the system to SF Cruiser by providing IP, SNMP community, username and password. Rest is done by SF Cruiser automatically.
When the system is registered with SF Cruiser, you can install the StorageCluster license. Activate the license (TAN) at Fujitsus license portal (https://ts.fujitsu.com/LicenseRegistration/Login.aspx) and you will receive a license key. The SC TAN will create three licenses, one for SC itself, one for SF Cruiser standard and one for Advanced Copy Manager Remote Copy (this technology is essentially used to "mirror" the bits between the nodes). Apply all of them to the added node and you're done.
Step 3: prepare the storage whitness/SC controller server
As already mentioned, the storage whitness is needed for avoiding split-brain scenarios. It is also required to enable automatic failover. Without a controller server you are restricted to do manual failover.
Requirements for this node are quite low. A supported version of Windows can be used to install the SF agent on it. Currently ONLY Windows OS is supported. Take care to only use a system that doesn't consume storage from the ETERNUS SC. You know what a chicken-egg-problem is? A physical backupserver for example is fine. Install the agent (get it from http://www.fujitsu.com/global/support/products/computing/storage/download/esf-mediapack-trial-download-16.html), add it to SF Cruiser and you're almost done. Later you will have to manually edit two files (C:\ETERNUS_SF\ESC\Agent\etc\Correlation.ini and C:\ETERNUS_SF\ESC\Agent\etc\TFOConfig.ini) to get the quorum function to work but for the moment you're done.
Step 4: Fibre Channel Zoning
Next step is to create aliases and zones for the frontend (the ports that offer storage to the applications ervers) and the mirror ports. If you have a DX200 S4 with 4 FC ports in each controller I would recommend to use the ports this way for a basic setup:
CM0 CA0 P0 -> Frontend Fabric 1
CM0 CA0 P1 -> Mirror Fabric 1
CM0 CA1 P0 -> Reserved port for failover
CM0 CA1 P1 -> future use
CM1 CA0 P0 -> Frontend Fabric 2
CM1 CA0 P1 -> Mirror Fabric 2
CM0 CA1 P0 -> Reserved port for failover
CM0 CA1 P1 -> future use
Create zones for application server <-> frontend ports, application server <-> reserved ports, mirror ports node1 <-> mirror ports node2.
Step 5: Port configuration
Make sure all ports are correctly set to Fabric mode (FC-Al is default for all ports and you have to switch them to fabric mode if you use switches). All FE and Reserved ports must have role CA whereas all MIR ports need the role RA (stands for Remote Adapter). Easiest way to reconfigure these ports is using the GUI from the ETERNUS system itself. You can also do it with SF Cruiser but its slower.
The ports should now look like this:
Step 6: Create Host Port Group
After enabling zones on the FC switch your storage systems "sees" the application server. Before you can present storage to the server you have to create a port group for the server containing the WWNs of the FC HBA in the server. For the moment, creating the host port group is enough, don't present any storage to the host.
Step 7: Create REC path
Creating a REC path allows both storage systems to transfer mirror data. This is done via SF Cruiser. Select the first storage system, click on Storage Cluster -> REC path and then choose "Remote Copy Conf." on the right side.
In the wizards first step choose the remote storage system and click next. Change all settings to match the picture below.
In the next step, choose CM0 CA0 P01 from the first system and the same port from the second system. Click on the "Add" button to add this connection. Do the same for CM1 CA0 P01 on both nodes. Your setup should now look like this:
If the status of the mirror links is not "Normal" check for correct port mode (must be RA), FC zoning (one zone for each mirror link), fabric mode (fabric, not FC-AL) and same speed.
In the next step (Select REC Buffer) leave all settings unchanged and click Next. Confirm all settings an click Close to finish the wizard. Now you have a working mirror path between your two storage systems. You don't have to do the same config on the second node. The config will be published to both nodes by SF Manager.
Step 8: Creating TFO Group
Next step is to create a TFO group. Within a TFO group you will define which CA ports of both systems correspond to each other and can be used in case of failover. Additionally you will define which storage system is the active one and which is the passive one.
First, click on "Set" in the right action window. Depending on which system you choosed in SF Manager this one will automatically be set as "Local Disk Array" and the second one will be defined as "Remote Disk Array". You can now choose, which one of them is the active one by setting "Primary Disk Array" to either local or remote. For now simply leave it with default settings.
You have to give the TFO Group a name. I prefer to use a meaningful name, something like "Site1_to_Site2" which will tell you at first sight in which direction traffic flows. Set failover mode to "Auto" (this requires the SC Controller to be configured. We will do that later), failback mode to "Manual" and Split Mode to "Read/Write". The last setting will force the LUN on the primary site to remain in read/write mode if a split between the storage nodes happens.
In the next step, chosse the corresponding CA ports and add them. You should have two pairs at the end of the wizard. Finish the wizard and check the TFO group was successfully created.
Hint: during my first setup I always got an error creating the TFO Group. The error message was some kind of senseless for me. The error was: "esccs02600 Primary TFO Group settings failed. Primary Storage=XXXXX, Primary IP Address=XXXXXXXX, TFO Group Name=XXXXXXXXX, Message=SEVERE:ssmgr3468:The specified port (000,100) disable of security setting". I had no idea what causes that error. Documentation didn't help me. One of FTS technical guys helped me out. You have to change another setting in the port configuration of the ports used in a TFO group. In SF Manager, go to Storage -> Storage Node you want to configure -> Connectivity -> Port -> FC -> select the CA ports and choose "Modify FC port".
Change "Host Affinity" from "Disable" to "Enable". Redo the TFO group creation and now it should work.
After the TFO group is created check the status of the group.
"Phase" = "Initial" indicates:
That no volumes are under control of the
Storage cluster up to now.
"Status" = "Halt" indicates:
There is a fault in TFO Group.
In our case the Storage Cluster Controller
(Monitor Server) is not configured.
"Active/Standby" = "Active" means:
The Primary DX system is active and its FC
link is switched on at its CA ports.
Step 9: Creating and mapping volumes
To get SC finally working you have to create RAID sets on both storage systems and create some volumes on top of that RAID sets. The name of the RAID sets doesn't matter but configuration (type and number of disks, RAID level etc.) should be the same among both systems. That's important as we talk about SYNCHRONOUS mirroring and you won't slow down the primary site just because the secondary site won't catch up with the changes. You can create as many RAID sets as you want.
Next step is to create volumes on top of these RAID sets. It is very important to have the volumes the exact same size on both systems. If they differ you can't set them in a mirror configuration. So create e.g. two volumes with one 2TB the other 3TB and do the same on the second system.
If you haven't already done yet, create host Host groups and LUN groups (with the newly created volumes). If you already created the TFO group then the secondary system has switched off it's CA ports and the system can't find the WWNs of your application server automatically. If this is the case, create the host groups via SF Manager on the active node, copy the WWNs to notepad and manually create the host groups on the second node. Make sure you create the ports in the same order as on the primary system to avoid confusing name patterns.
Finally create a Host affinity between the host group and the LUN group. Use the same LUN IDs for the same volume on both nodes. Do this on both nodes. SF Manager will recognize the mappings and will add the volumes to the TFO group. You can check it in SF Manager -> Storage -> Select one of the two SC nodes -> Storage Cluster -> Overview -> FTO Group Detail (the link in the overview). There is a "Volumes" tab in the mid of the window. Click on this tab and check if your volumes are listed.
Check your application server for the new volumes. If you haven't configured MPIO yet you will see 4 disks in this special case (two disks over two pathes). Add ETERNUS DXL to the list of MPIO providers and restart your Windows OS to have now two disks with 11TB each secured by MPIO.
Step 10: Configuring the controller server
Last step is to configure the SC controller server. As you can see in the picture above, the Halt Factor is still the disconnected monitoring server. So it's time to get this topic done....
In step 3 we did almost all required steps for the whitness to work. Connect to the system with RDP.
Change to directory "C:\ETERNUS_SF\ESC\Agent\etc", open the file "Correlation.ini" with an editor tool like "notepad", add the line "StorageClusterController = ON" (case sensitive, without the "") and save the modification.
Next step is to open the file "TFOConfig.ini" in the same directory and add the IP addresses of both storage systems. Use a new line for each system. The line must look like this: "IP=xxx.xxx.xxx.xxx". Do not use "" and do not add spaces. Save the file and restart the agent service (ETERNUS SF Storage Cruiser Agent).
To check if the modification is working open a Powershell prompt, change to c:\ETERNUS_SF\ESC\Agent\bin and start agtpatrol.bat. You should now see information about your TFO group settings. That means, everything is running fine.
Second check is within SF Manager. Go to one of your cluster storage systems -> Stroage Cluster -> Detail of TFO Group. The status of "Halt Factor" now changed to "Normal".
That's all ladies and gentlemen, you've successfully setup a FTS Storage Cluster.
Part 3 will discuss how to test failover and failback and some general hints on SC. Stay tuned.