Things to consider when using SSY-V AutoTiering feature

User Rating: 0 / 5

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive
Published: Sunday, 20 October 2013 13:54

With AutoTiering getting more and more mainstream on most storage systems, there should be some awareness on how this feature works and what kind of influence it can have on your environment - positive as well as negative.

What exactly does AutoTiering do? Well, each storage vendor has it's own definition on what exactly it does but all have quite the same bottom line: moving blocks based on usage statistics between available classes of storage within the storage system to optimize performance.

In a monolithic storage system where you have two controllers addressing the same storage you normally have nothing to consider becasue this is a relative "static" environment which means the way application servers access the storage system is always more or less the same.

In a virtualized storage system based on DataCore's SANsymphony this is a bit different as you have simply said always two storage systems (DCS servers), each having it's own private storage, it's own intelligence and cache. In terms of AutoTiering, each DCS has it's own table of which data is "hot" and has to be moved to faster storage and which data is "cold" and will be moved to slower storage. That means, despite the fact that the DATA is exactly the same (if you mirror your data between the two DCS), the layout of the data can be completely different. Why can this happen? This is quite easy if you dig a bit deeper into SSY-V. 

Each application server that accesses data on a mirrored SSY-V environment has a so called "preferred server" set in it's properties within the SSY-V software. This preferred server will always be used when accessing the storage. This makes sense especially within a geographically dispersed environment where you have two datacenters, each hosting one of your two DCS and some application servers. With the preferred server setting you now can force the application servers in DC1 to mainly access DCS1 in the same datacenter. If DC1 fails, all servers in DC1 will switch to DC2 but as this is not the optimized way, this will only be a temporary switch. As soon as DCS1 is back online, all servers will switch back.

This is true for reads and writes, the only exception is that writes in a mirrored environment also have to be sent to DCS2. Keep that in mind.

Okay, let's assume you have a server in DC1 that mainly reads data (some kind of business warehouse server or something like that) and only has a few writes. DCS1 will get all reads and writes and will create it's AutoTiering heat map. This heat map will result in blocks being migrated between the available storage classes to get best performance.
DCS2 that helds the same data for this server only gets the writes so for DCS2 it seems that this particular server only does some few writes. It will decide based on it's internal logics to move the data from this server down the tier classes because the data is cold.

As long as DCS1 is available, there will be optimzed performance for the application server available but what if DCS1 fails? The server has to access the mirrored data on DCS2 where it's data is destaged because of nearly no access. Can you imagine what that means?

Beside the fact that the server needs to access data over an ISL, there will also be no write cache available as DCS1 is down and therefore write cache will be disabled. So we will have performance impact because of ISL traffic, no write cache and suboptimal placement of data because of AutoTiering.

So let's say this will happen only in some very special situations like site fails or maintenance windows where performance isn't your major concern. This is true but let's think about having any kind of server virtualization in place......

Server virtualization (and this time it doesn't matter which vendor you choose) gives you some cool features like moving VMs around and around without any impact on the service availability.

Let's take our example from above and convert the BW server to a virtualization host running some VMs. In DC2 there is another virtualization hosts running some other VMs too. Both servers put workloads on their local DCS and both DCS build heatmaps and move data around to optimize. As there is no synchronisation of the heat maps between the two DCS act totally different in terms of AutoTiering.

If you now move (manually or automatically) a VM from DC1 to DC2, the storage on the new side isn't optimized for that VM so it's quite possible that you will suffer some kind of performance degration.

To anticipate the result, at the moment it isn't 100% clear if you really will have a negative influence on the performance in such a situation. First the cache from DataCore will relatively fast catch up with the demands and will buffer. Second, AutoTiering will catch up as well so after a short period of time hot blocks will be moved to fast storage and the problem is solved.
Third, the VM will only be slower if there is really a high workload on that machine.

I talked to Alex Best from DataCore about this problem and he said, engineering is aware of that problem and they currently think about how to solve this. But as this isn't a quite common problem and they never got calls regarding performance in such scenarios, there is only a low priority currently assigned.

So to be on the safe side you have several options:

As far as I can say at the moment this is really a theoretical problem as I have never observed this in production use. If you have any other experience, please add a note to this article. I will also add a poll in the near future regarding this problem.