Capturing VSAN data with vRealize Log Insight

Few doubt the power of decent logs. Many a time have I stared at a screen at the business end of a tail –f eyes bubbling as I try to follow lines of text scroll past me. Using grep and other search tricks become part of your daily repertoire.

The trouble often is looking for the proverbial needle in a haystack. With the right amount of information, focus, coffee and a tonne of SSH sessions you can often find what you’re looking for.

Enter vRealize Log Insight. Log Insight for the un-initiated is a log aggregation and analysis by VMware. It uses intelligent analytics and machine learning-based grouping and an easy unified management interface with all the power of interactive querying of all infrastructure logs. It come as a single Virtual Appliance and is incredibly easy to install and configure. This short YouTube video shows the process

Log Insight provides the ability to point all your host logs to the one point in a very simple one step process. It’s the hidden gem in the VMware aresenal in my opinion.


Specifically for Virtual SAN there are a number of different logs you may want to look for. You can understand how it could become frustrating trawling through each log file.

Thankfully Log Insight has the ability to aggregate all the data in these logs and allow use to perform interactive queries on them all at once. Once the logs are ingesting it’s time to look at the information we can utilise.

Log Insight already comes with a built in VSAN dashboard shown below, as well as many other pre-canned Dashboards.


Interactive Analysis – BYO Query

The Interactive analysis view gives a customer the ability query logs for particular key words. Some examples I have tried below:

clomd enforcement errors – Likely VM provision failure due to policy enforcement

clom_enableprorebalance – Disk utilisation has increased beyond 80% or enabled manually in 6.0

APD state: APD – All paths down state, causing object availability issues – VSAN VMkernel network problems

latency > 10ms – Latency increasing beyond 10ms

io timeout – IO not being completed in a timely fashion

#Limited only to your imagination.

The power of Interactive Analysis is really extended when you see how to manipulate the raw data. Queries can be filtered by time scale, regex, event type and fields. They can be exported to raw events in txt as well as JSON, you can create alerts on certain events and even build dashboards on your interactive queries. You can also track the variance, sum, average, max, min and other details of the query rather than simply the count, quite powerful for visualising data.


Here’s a simple example of building a query. First we search for all latency events, then with the results we create a name for the latency field (1235). Now I can use that field as a filter for future queries. So now I want to build a query that shows only when latency went from between 15ms and 29ms.



Now notice the little dashboard icon the top right hand corner of the previous screenshot (next to the yellow star). I can take this query and build a dashboard from it. In my experiments across my VSAN cluster I have used the queries above and saved them to a newly created Dashboard. Note these can be shared or private Dashboards depending on who you want to have access. Here is what it might look like




Log Insight is a power tool for aggregating logs in your Datacenter, simple and easy to setup/configure in a few minutes. vSphere and VSAN log files provide a wealth of informational, warning and error information that can be traditionally very hard to find and represent in any meaningful way. By using Log Insight with your vSphere infrastructure and in particular VSAN clusters you’ll be able to harness all this information for good and shortcut troubleshooting problems from hours and days to minutes or even seconds.

There is plenty of information on the Log Insight product page and my colleague Steve Flanders @smflanders has some great tips on creating queries which may help you