Setting VSAN.CLOMRepairDelay Cluster-Wide with PowerCLI

Virtual SAN advanced setting VSAN.ClomRepairDelay specifies the amount of time VSAN waits before rebuilding a disk object after a host is either in a failed state (absent failures) or in Maintenance Mode. By default, the repair delay value is set to 60 minutes; this means that in the event of a host failure, VSAN waits 60 minutes before rebuilding any disk objects located on that particular host.

Screen Shot 2015-08-24 at 8.06.34 am

Currently, the advanced setting VSAN.ClomdRepairDelay timeout has to be set on a per host basis. This can be painful when there are a lot of hosts in the cluster.

There is a KB article with instructions on how to do this on a per host basis. I thought it might be a reminder to show customers how to leverage PowerCLI to change this setting for all hosts in the cluster at once. Again, this gets more valuable the more servers you have in the cluster.

First you’ll need to connect to the vCenter Server using Connect-VIServer and enter your credentials.

Use the following one-liner to check the settings first.

PS C:\> Get-VMHost | Get-AdvancedSetting -name “VSAN.ClomRepairDelay”

Name     Value Type Description
—- —– —- ———–
VSAN.ClomRepairDelay 60 VMHost

Then use that command to use the Set-AdvancedSetting value to whatever value you like (30 in this case)

PS C:\> Get-AdvancedSetting -Entity (Get-VMHost) -Name “VSAN.ClomRepairDelay” | Set-AdvancedSetting -Value 30

Perform operation?

Modifying advanced setting ‘VSAN.ClomRepairDelay’.
[Y] Yes [A] Yes to All [N] No [L] No to All [S] Suspend [?] Help (default is “Y”): a

# Now keep in mind for the setting to actually take effect you will be require to restart the clomd service on every host in the cluster by issuing /etc/init.d/clomd restart.

I wrote a quick script to do this. It requires plink.exe as it needs to SSH to every host to restart the service:

Screen Shot 2015-09-15 at 9.40.49 am

The script is provided with no support, guarantees or warranty.  You should test in an non-prod environment if you want to use it. Grab the script here setclomdelay.ps1

NOTE: There are future improvements I could make to the script, but it works.

Now.. It’s important to understand the motivation for this. If you truly have a use case for lowering the default time delay to 45 mins or 30 mins or some other setting then fine, although I personally don’t recommend it. Where I have seen this used is in POC or lab environments where you want to validate the behaviour. In a production environment, No.

If the goal here is to be able to enable faster rebuild of data without waiting for the timeout then we have you covered.

VSAN Health Services Plugin has a function that allows you to do exactly this. It’s called Immediate Object Repair. I blogged about it here http://vsanteam.info/immediate-object-repair-in-vsan/ and there is also a video on youtube which shows the process in action https://www.youtube.com/watch?v=uV2MIsqZzzk&list=PLjwkgfjHppDtONKrts8wrmZpdf35VCD7y&index=4.

 

So if your goal is to rebuild data faster when you realise you don’t want to wait for 60 minutes, don’t go changing the ClomRepairDelay, just use the Health Services Plugin.