Date created: Friday, January 20, 2012 11:54:00 PM. Last modified: Tuesday, September 19, 2017 12:03:08 PM

RRDTool Traffic Drop Alert

This script will use rrdtool to pull two values from an rra file. If the the more recent one is lower than the older value, by a given threshold, an alert email is sent.

Various RRDtool based monitoring platforms have "threshold" style alerting, like the "Thold" plugin for Cacti but if a situation requires alerting about a sudden drops on link utilisation, or sudden rises, Thold might not do exactly what is needed. An example would be one link dropping by 90% usage since it was last polled 5 minutes ago, and another link increasing in utilisation, again 90%, since it was last polled 5 minutes ago. This may indicate a transfer problem across the link that has caused a routing protocol re-convergence but not a layer 1 link down alert because the link is still up.

Care must bet taken against false reports. This is most useful inscaenarios where a data stream is being sent over link continuously and indefinitely.

traffic_in_watch.sh

#!/bin/bash

#./traffic_in_watch  /path/to/rrafile 'description of error' trigger_threshold
#
# Example:
#./traffic_in_watch /var/lib/cacti/rra/router1_fa0-1.rra "core router 1 fa0-1 link has dropped very low" 0.2
#
# The trigger_threshold is a floating point value of link usage percent, inverse to how much it was using on the newer
# sample compared to the older sample.
# For example, 0.2 means the link must be at 20 percent usage on the newer sample compared to what it was on the first sample,
# so it has dropped 80 percent. To get triggers for a 90 percent drop in usage set it to 0.1

recentepoch=`rrdtool last $1`
recentepoch=`echo "(($recentepoch/300)*300)-1" | bc`
previousepoch=$(($recentepoch-300))

recentsample=$(rrdtool fetch $1 AVERAGE -s $recentepoch | grep "e" | head -n 1 | awk -F " " '{print $2}')
recentsample=`printf "%.f" "$recentsample"`

oldersample=$(rrdtool fetch $1 AVERAGE -s $previousepoch -e $recentepoch | grep "e" | head -n 1 | awk -F " " '{print $2}')
oldersample=`printf "%.f" "$oldersample"`

if [ $recentsample -lt $(printf "%.f" "`echo "$oldersample * $3" | bc`") ]
then
 recentspeed=`echo "($recentsample*8)/1000" | bc`
 olderspeed=`echo "($oldersample*8)/1000" | bc`
 mailtemp=$RANDOM.$RANDOM
 echo "" > ./$mailtemp
 echo "$2" >> ./$mailtemp
 echo "$1" >> ./$mailtemp
 echo "`date -d @$recentepoch` ($recentepoch) : $recentsample Kbps" >> ./$mailtemp
 echo "`date -d @$previousepoch` ($previousepoch) : $oldersample Kbps" >> ./$mailtemp
 cat ./$mailtemp | mail -s "Link Utilisation Alert" "user@email.com"
 rm ./$mailtemp
fi 

This can be triggered with a crontab entry like the following;

*/5 * * * * /path/to/traffic_in_watch.sh /var/lib/cacti/rra/router1-01_link1_traffic_in_1234.rrd "There has been an 80 percent or greater drop in traffic from Router-01 link-1, within 5 minutes." 0.2

Previous page: RRDTool Total Bandwidth In & Out
Next page: SNMP Extend