Immediate Object Repair in VSAN

As you may know Virtual SAN is an object based storage system, meaning that the components that make up a Virtual Machine (VM Namespace, Snapshots, VMDK’s, Swap Files) present as objects to the storage system rather than files as in typical file system based storage.

This has some advantages but one of the primary ones is that an admin has the ability to configure different storage policies for performance, availability, capacity for instance on a per object basis. To illustrate why this important, imagine a SQL Server that typically has an operating system drive, database drive(s) and log(s) drives as a fairly primitive example. In traditional storage architectures this would require multiple LUNs and Datastores to service this request for a single VM.

VMware Software Defined Storage is setting a new paradigm for how we solve this issue in both traditional storage array with Virtual Volumes and hyper-converged architecture with Virtual SAN.

Virtual SAN is a fully distributed system which distributes a virtual machines objects across the cluster according to policy. Let’s focus on the following diagram where we the default Policy of FTT=1 set. We can see 2 full replicas of the VMDK object and a witness component. Capture Now if we were to sustain a absent failure which could include things such as:

  1. Host Failure or outage
  2. Host Maintenance
  3. Network outage
  4. NIC Card failures

In these examples you may be aware we have a delay of 60 minutes before we commence rebuilding objects that were affected by the outage or failure. This delay is known as clomrepairdelay and is configurable however you would configure the same setting on each hosts and the host requires a reboot. It is the recommendation to maintain the default setting. The following diagram shows the repair delay in the case of a host failure as an example. sIn certain circumstances however, there may be times when you know that your maintenance or failure is most definitely going to last beyond the default 60 mins. Enter the VSAN Health Check Plugin. By installing the Health Check Plugin on your vCenter server and enabling the agent on your ESXi servers you now have the ability to rebuild these affected objects at any time you choose. Let’s examine the following screenshot. Here I have a host in Maintenance Mode and I can see that during this time my object health is affected. You can see that I have 128 objects that are waiting for the repair delay to timeout before both objects are restored and full object redundancy is returned. 3 You will also notice the button on the right hand side about half way down the screen which say Repair Objects Immediately. By selecting that button we are initiating the rebuild of these objects right away. You can also view the rebuild of this data by viewing the resync dashboard. 4 Once the rebuild has completed you should see the object health has returned to it’s fully redundant state and any virtual machines that reduced redundancy during the time are now fully compliant with their policy or the policy configured for their objects. 5 There is a lot more to the Virtual SAN Health Check Plugin. It’s a free download and should be the first thing you install when deploying a Virtual SAN 6.0 cluster. The Health Check Plugin is a free vCenter Plugin that is downloadable under the drivers & tools tabs in the Virtual SAN download page. If you are not a Virtual SAN 6 customer, you can still try out the plug-in by downloading your free 60 days evaluation copy –