In an earlier post I wrote about the tweaks I made to Thomas Weaver’s Nagios script for monitoring our HP P2000 SAN. I am pleased to say that Thomas has incorporated these tweaks into his version of the script which is available for download on his site here.
As I have documented our new VMware Cluster and HP SAN further I have begun to realise that we needed to monitor things on the SAN in a slightly more granular way than this script was allowing us. First off, here is my first attempt at a network diagram of the whole VMware cluster:
As you may or may not be able to see from this diagram the Virtual Machines which we are running have both controllers for the SAN defined as parents in the topology. However, as we do not have Datastore HA licensed in our VMware Cluster, the VM is dependant upon the status of the SAN Volume which holds its files.
What I wanted to be able to do is to monitor the status of an individual volume or vdisk on the SAN. This will then allow me to set up service dependencies between the VM host and the volume service check in Nagios, as well as having the VM hosts parents’ defined as the SAN controller hosts. I felt that this was going to be the best representation of the actual environment in our Nagios system.
So to be able to do this I needed to edit Thomas’ script further. You can see my edited version over here on Paste Bin and hopefully Thomas will include these changes into his version too. This introduces two new commands to the script, “named-volume” and “named-vdisk” if using either of these commands you must also pass a -n variable to the script which needs to contain the name of the volume or vdisk as defined in your P2000. The script then access the web API and retieves the XML return for the status of the volume or vdisk. If the health return of this is anything other than “OK”, the script will return the reason and recommendation to you in the output to Nagios.
We are now monitoring the following things on our P2000 SAN through Nagios:
- HTTP Service is available – using check_http
- Each individual volume status – using the adjusted version of check_p2000_api.php
- Each individual vdisk status – again using the adjusted version of check_p2000_api.php
- Overall system status – using the adjusted check_p2000_api.php script but with no changes to this part of the script
- Ping check
I feel that for status monitoring of our SAN this is more than adequate; in time I will probably look at performance monitoring too, at which point I can look into that side of this fantastic little script that Thomas has provided.
The next part to get monitored fully is the iSCSI switches, then on to the ESXi hosts; but more on those parts in another entry!