Check that an FTP account is fully working
This script uses lftp, a sophisticated ftp/http client, to check not only that a give FTP account is accessible, but that it is also able to list files and directories, to get and put files and to delete files. This simple script is fast, easy to configure, flexible and can be extended easily.
Sometimes, things like SELinux, a failed network mount point or wrong permissions cause an FTP account to not work properly. With this check, you will be able to detect it immediately.
Check that any network filesystem partition is correctly mounted
A colleague of mine, Thomas Blanchin, has improved my glusterfs mounted nagios check so that it works properly with any network file system. It generates the proper output and can be used for any network file system without much trouble.
Puppetcamp Europe 2010
According to this information, Reductive Labs, the team behind puppet, has announced that Puppetcamp Europe 2010 will happen on 27 and 28 May 2010 in Ghent, Belgium. For those who do not know, Puppet is an open source application for system control automation.
Releasing cached memory in Linux
Under normal circumstances, modern Linux systems try to cache into memory disk data that is accessed often. Sometimes, we have that much memory in the system that our kernel keeps filling up the memory by caching every piece of data we access.
Other times, because of the swappiness factor, active data finds its way into the swap instead of the main memory. I have seen this behavior in a few systems hosting databases, specially running mysql, and it is a serious performance hazard.
In order to fix systems like this, we need to fix the swappiness, drop the caches and, swapoff and swapon the system swap.
Check that a glusterfs partition is mounted
When using glusterfs in a production system, it is mandatory to properly monitor that the partition is mounted and performing well, specially in heavy loaded environments.
I have created a nagios plugin in bash that monitors a glusterfs mounted partition and detects whether the partition gets unmounted, responds slowly or gets disconnected from the server (causing reading processes to die in an uninterruptible sleep state, which will force you to restart the system in order to get rid of them).
Check the percentage of CPU consumed by processes with the same name during a certain interval
Many nagios scripts use ps to compute the percentage of CPU consumed by a process. Although at first instance this might seem a good approach, if you read properly the documentation, you will notice this:
CPU usage is currently expressed as the percentage of time spent running during the entire lifetime of a process. This is not ideal, and it does not conform to the standards that ps otherwise conforms to. CPU usage is unlikely to add up to exactly 100%.
This means that ps is useless if you require to know whether a certain process is consuming a lot of CPU percentage during a given interval. For instance, imagine that you want to detect whether a given process has hanged and is consuming lots of CPU; using ps you will be completely unable to detect it.
In order to work around this and provide a proper monitoring solution for this type of problem, I have written a script in python that calls top. This command does offer the percentage of CPU during a given interval, not for the whole lifetime of the process.
New style setup
After talking to several people, they convinced my to switch to a new style. I have to admit this will ease my life and hopefully make everything cleaner in my web space. Time will show the results!