We have a number of our physical Linux servers set up to use Linux MD RAID to provide either RAID 1 or 5 fault tolerance on our disks. This is all great so long as it is working as expected! I came into work to find that after a reboot from a kernel update one of our servers could not bring up its swap drive. The swap partition was a RAID 1 array made up from two mirrored disks.
I began to look at mdadm to find out what was wrong. Running:
# cat /proc/mdstat
revealed that one of the drives had failed putting both arrays into degraded mode and to make matters worse the only remaining good disk had now developed errors in the partition used for swap! Thankfully the second array / partition which contained the system files was still on-line, albeit in a degraded state.
So the first thing to do was to get a new disk into the array and synchronise the data onto it. After that I needed to remove the other original disk and replace that too. Once all that was done and the data re-synchronised onto both new disks I wanted to look at how we can increase our monitoring of disks so that we don’t get in this situation again!
Continue reading Ubuntu, RAID and SMART
*** UPDATE ***
All the code for this plugin is now hosted over on GitHub: https://github.com/jonwitts/nagios-speedtest.
*** UPDATE ***
Version 1.2 is now available. You can now specify a Speedtest Mini Server to check against. If you are updating to this version you will need to change your Nagios check commands to include the new “l” argument to define if you are checking against an internal (Mini) or external Speedtest server. Download links both here and on Nagios Exchange are updated to the new version.
*** EDIT ***
Version 1.1 of the script is now released with some improvements suggested by Sigurdur Bjarnason via email. The download link below is updated to point to the new version and the version on Nagios plugins is updated too.
The main change is that you now need to define the location of the speedtest binary in the script before it will run, and you must now also pass the Server ID of the Speedtest server you want to check against in the command. See the usage of the script for more details.
This week we were trying to download some files from work and the download speed was slow to say the least. I then made a SSH connection to my PC at home and downloaded them from there and then copied the files back to work with WinSCP; all of this quicker than downloading directly at work!
This lead us to wondering if there was a Linux utility for testing Internet upload and download speed on the CLI. A quick Google search later lead us to: https://github.com/sivel/speedtest-cli . The developers of the small Python utility describe it quite simply as: “Command line interface for testing internet bandwidth using speedtest.net”
I installed this on both my home PC and our Nagios server at work and begun to play around with what we could do with it. I have long thought that it would be nice to be able to monitor and graph the upload and download speed of our connection so that we can spot trends as to when we are getting Internet slow-downs. To date I had not found a Nagios plugin which would do what I wanted, but this little CLI tool could quite easily be manipulated to my own ends!
Continue reading Nagios Speedtest plugin
In an earlier post I wrote about the tweaks I made to Thomas Weaver’s Nagios script for monitoring our HP P2000 SAN. I am pleased to say that Thomas has incorporated these tweaks into his version of the script which is available for download on his site here.
As I have documented our new VMware Cluster and HP SAN further I have begun to realise that we needed to monitor things on the SAN in a slightly more granular way than this script was allowing us. First off, here is my first attempt at a network diagram of the whole VMware cluster:
Continue reading Nagios HP MSA P2000 Status and Performance Monitor – Part 2
We have just finished installing our new HP P2000 SAN ready for the implementation of our VMWare set up next week and thought we should set up some monitoring other than basic pings for them in Nagios! HP supplied the SNMP MIBs with the SAN, but rather than write new SNMP queries for everything we wanted to monitor, I thought I would search the Nagios Exchange first to see if anyone else had already created such a plugin.
Continue reading Nagios HP MSA P2000 Status and Performance Monitor
I was recently asked to review “Instant Nagios Starter” by the people over at Packt Publishing.
This book is the first book I have read in Packt’s “Instant” series, which is publicised as “Learn in an instant. Short; Fast; Focused”. The book is certainly short; it is only 46 pages long and by the time you get to the start of Chapter 1 you are already 17% of the way through the title! The book is available in e-book format only (epub, mobi and pdf versions are all available). The price too is very reasonable; coming in at under £5 including VAT!
Continue reading Instant Nagios Starter Review
Today I have been playing around with my Nagios install and making a real effort to get all services on the Windows servers monitored effectively. One of these servers I have been looking at is running WSUS and a couple of other web based services on different ports.
I started to read up on the help file of the check_http plug-in, which up until now I had assumed (wrongly!) would just check for a web response on port 80 for the host you ran the check against.
Continue reading Nagios check_http plugin