Handling file locks in vSphere

Recently I blogged about a quite convenient way to release read-only locks on vmdk files that prevent snapshots from being committed. Unfortunately a quite common problem with 3rd party backup tools that leverage the vStorage API. Using StorageVMotion can help in releasing these locks but sometimes even SVmotion won't work.

One more side note here: one can ask why there can be a mismatch about the VM still running in snapshot mode and the vCenter server not knowing anything about it, or better said, without the snapshot manager showing any existing snapshots on that VM. That's because the backup programs send a "remove snapshot" command to the vCenter server. The VC sends the snapshot command to the ESX host that is responsible for the snapshot handling and at the same time it deletes the information about the existing snapshot from the VC database without waiting for the ESX to confirm snapshot removal. This is by design, don't ask me why. So the new "feature" "Consolidation" in vSphere 5.x is nothing more than a simple database query against all listed snapshots and this list is compared to the settings of each VM. When VC finds a mismatch it activates the "Consolidation needed" button. Curious....

Because I have troubleshooted several of these problems in the past I decided to give a short step-by-step guide to identify which system holds the lock and how to release it. I can't cover all situations where this problem occurs but in ~95% of all cases it was one of the two solutions provided here.

  1. The first an almost most important step is to reboot the physical backup server you use to backup your VMs.  This is especially true if you have a server that uses SAN mode to get the vmdks. These servers tend to hold locks  when backup jobs fail or crash. A reboot of these systems sometimes is enough to reset the lock. If you have a virtual backup proxy as tools like Veeam or vRanger supports then rebooting this VM won't help.
  2. Check if the consolidation process now works. If it still fails proceed to the next step.
  3. Identify which file is really locked and prevents the snapshot removal process. To accomplish this shutdown the VM because if the VM is running you will see the hosting ESX server as one of the file lock holders and this can be confusing. 
  4. Log on to the ESX server that has the VM registered via SSH and run the following command:

    for a in $(ls -1);do echo $a; vmkfstools -D $a; done


    Search for the entries of the base vmdk of the VM (the one that is called *-flat.vmdk). In almost every case this is the locked file.
    You will see an output like:

    [...]

    virtualdisk-flat.vmdk

    Lock [type 10c00001 offset 37965824 v 5443, hb offset 3821568

    gen 28763, mode 2, owner 00000000-00000000-0000-000000000000 mtime 5628465

    num 1 gblnum 0 gblgen 0 gblbrk 0]

    RO Owner[0] HB Offset 3821568 546b498e-875a9360-10e8-b499baa6cd9e

    Addr <4, 31, 2>, gen 4, links 1, type reg, flags 0, uid 0, gid 0, mode 600

    len 42949672960, nb 5120 tbz 3277, cow 0, newSinceEpoch 0, zla 3, bs 8388608

    [...]

    The red marked string shows one of the MAC addresses of the host that holds the lock.
  5. Search for the MAC address in vCenter server by crawling through all the physical network adapters in your cluster. Once you identified the server proceed to the next step.
  6. Logon via SSH on the server identified in step 4.
  7. Check if any ESX system process uses the disk mentioned before:

    ps | grep -i virtualdisk

    You normally won't see any result here so no system process is the lock owner.
  8. Next step is to check if any service uses the disk by typing:

    losf | grep -i virtualdisk


    This will probably print some results in the form:

    11696694    vmx                   FILE                       84   /vmfs/volumes/52e6b53d-8ac6ea98-9caf-001517a5bf78/servername/virtualdisk-flat.vmdk

    The first column tells you the so called cartel ID. You will need this information in the next step. The second column tells you that the disk is in use by a VM process. That means one of your VMs has the disk attached and thus holds the lock. The other columns are unimportant.
  9. Now check which VM holds the lock. This can be done with the command:

    esxcli vm process list | grep 11696694 -B3


    The output will be something like this:

    servername

       World ID: 11696695

       Process ID: 0

       VMX Cartel ID: 11696694

  10. Open vSphere client, connect to the ESX where you executed the last commands on and search for the VM that the last command shows in the first line (in this case it's servername). Goto the VM settings and check the attached hard disks. You will see the disks that are locked. Remove them by delete the harddisks from the VM but WITHOUT deleting the disk files from the disk.

This problem is mainly caused by virtual backup proxies that use hot-add mode for data transport. If they crash or have other problems so the disk deattachment can't be completed the base VM disks remain connected and thus locked for other processes like snapshot removal tasks. That's the reason why a simple reboot of a virtual backup proxy won't help in this situation as the locked disks remain attached to the VM and are still locked for other processes.

To give a final advise: if you encounter locked files first reboot the physical backup server. Next check for the lock holder as described in this article. Next try a SVMotion. Next step is to call VMware support as you probably have a rare problem where standard procedures won't help. Good luck.

 

Leave your comments

Post comment as a guest

0
Your comments are subjected to administrator's moderation.
  • No comments found
Powered by Komento
joomla templatesfree joomla templatestemplate joomla
2017  v-strange.de   globbers joomla template