SSY-V: unable to connect to the GUI

After upgrading one of our customer's DataCore SSY-V installation from Windows Server 2012 to 2012 R2 running SSY-V 10 PSP1 we encounter strange GUI logon problems every now and than. The storage virtualization stack keeps running so access to the disks and mirroring is still functional but management is impossible.

A few weeks ago, 5 days after the OS upgrade our ICINGA monitoring systems send error messages that several checks we use to monitor the DataCore environment failed. The simple error message from the checks was "waiting for a controller to become available". We use ICINGA in combination with DataCore's Powershell commands to get performance stats and several other information. An agent on the DCS starts these Powershell scripts, logs on to one of the DCS and sends the output back to the ICINGA node. Quite simple but very useful.

We then tried to connect from a central management system to the DataCore environment to see if there is really a problem. The SSY-V GUI started but was unable to connect. The connection timed out. We tried several times but without success.

 

As we urgently needed access to the GUI because the customer needed some new vDisks we opened a call with DataCore support. They tried everything, starting from network access, permissions, time difference on the DCS hosts, pinging with a specific packet size, restarting WMI services but nothing helped. Then we did a livestop on every node and restarted the management service. Still no success.

A few support round trips later the call was escalated to the next level and the developers had an eye on it. The made some more proposals but nothing worked. Everything seemed to be correctly configured and running. In the last webex session a new support guy retried some of the tests we've already done including the livestop of both DCS. This time he livestopped both DCS at the same time and restarted only one of them. Then he tried to access the DCS from the GUI on the same server. That didn't work. He livestopped the DCS once more, changed to the second DCS, started this one and opend the GUI. This time, the GUI was able to connect to the local server. One step closer to a solution......Starting the still livestopped first DCS brought this one back online too. Perfect, full access to both DCS again.

We asked him why the order of restarting the services could be important and he explained that in a SSY-V cluser one node is the "master" and all others are slaves. In case of a "master" failure, the next starting slave will promote as new master. If a slave founds the known master on the network it will never try to promote as a master even if the old master has problems. Exactly this was our problem. The old master was up but had problems in working as a master and the second node thus never tried to get the master role. That's why we had no working master at all. In changing the start order from the old master first to the old slave first, the slave was unable to find a mster on the network and thus forced itself to be the new master. The old master starts up, sees a new master and converts itself to a slave.

A few days ago the problem reoccured and we were able to bring the GUI back working by following the correct restart order but the problem is now known to DataCore support and they try to find the reason for this behavior and provide a solution. The restarts can only be a workaround but if you have similar problems ask DataCore to give their OK for a livestop and restart and try all restart orders you can imagine. Probably one will work.

 

Leave your comments

Post comment as a guest

0
Your comments are subjected to administrator's moderation.
  • No comments found
Powered by Komento
joomla templatesfree joomla templatestemplate joomla
2017  v-strange.de   globbers joomla template