Datacore: speed up resyncs

User Rating: 0 / 5

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive
Published: Monday, 17 December 2012 08:59

Sometimes it's neccessary to resync a whole bunch of vDisks/NMVs within DataCore's SANsymphony(-V) software, e.g. because the service crashed or you haven't stopped the virtualization in the right way. This causes DataCore to do a full recovery on every volume where the state is in doubt.

This can lead to running resync operations on many volumes at the same time.

Per default, for every volume the recovery prioity is set to 0. This causes every volume to be in resync at the same time.

You can say: "Cool, this will max the resync speed and the volumes will be resynced within the shortest time". Unfortunately this is not correct. It's correct that all volumes run with the same priority and thus, with the same speed but this will not max out your mirror links.

In my special case, we had a crash of one DataCore server and 131 volumes required a full resync. Amount of data was ~70TB. All running with the same priority, the resync speed peaked at ~350MB/s with an average speed of ~200MB/s. This is quite low for the backend storage and 8GBit mirror links.

So what to do? The answer is quite easy: use recovery priorities to select single volumes to be priorized. The most important thing here is SINGLE volumes. There is no sense in setting 5 or 10 volumes from the same storage pool to a high recovery priority. This won't speed up the resync process.

It's important to choose only 1 or 2 volumes from the same pool at the same time (you can do this for every pool if you have more than one). The resync process will now focus on these volumes and raises their recovery speed. Keep in mind, all other volumes still do recovery and use some mirror link and storage ressources (you can't stop or pause resync operations for those volumes except splitting the mirror).

Changing priorities for two volumes at the same time had an tremendous effect on recovery speed. Suddenly there were peaks at 700MB/s with an average of 600MB/s. This is nearly 3 times the speed compared to the default settings.

Waiting for the priorized volumes to be recovered, the speed dropped again at ~300MB/s. Changing the next two volumes and speed raised again to 600MB/s in average.

This way I recovered ~40TB in the first 24h.

This procedure is a bit annoying, you have to make your changes, wait for the resync to complete, take the next volumes... and so on. You can only stretch the time between the manual interventions by choosing priority 3 for two volumes, priority 2 for the next two and prio 1 for additional two volumes. This way, the resync process first focuses on prio3 volumes, then on prio2, later on prio1 and at least on the remaining prio0 disks.

Nevertheless, cutting the resync time by nearly a half is worth the effort.