How Cacti Graphing Works: ========================= Cacti requests (polls) the ifInOctets (1.3.6.1.2.1.2.2.1.10) and ifOutOctets (1.3.6.1.2.1.2.2.1.16) SNMP OID values for a specific interface from a specific device. This is a 32-bit value. Cacti can also poll the ifHCOutOctets (1.3.6.1.2.1.31.1.1.1.10) and ifHCInOctets (1.3.6.1.2.1.31.1.1.1.6) values if supported by the device. All devices added to Cacti should use the 64 bit values to support interface speeds of 100Mbps and higher! The values being polled is the number of bytes (octets) sent and received by the interface (usually since it was last shutdown or the device was last rebooted). The interface in/out bytes value is polled every 5 minutes and recorded in an RRD file. The RRD file can be imaged as an Excel spreadsheet with three columns; time, traffic in and traffic out. Each "row" within the RRD file has a time value in the first column that is a round 5 minute date/time interval (stored in Unix epoch format) such as 01/01/2017 12:00:00, 01/01/2017 12:05:00, 01/01/2017 12:10:00 and so on. Cacti polls all the devices in its inventory on a 5 minute interval. It is important to note that this polling interval is the same value the RRD files use to timestamp each "row" in an RRD file. A device is polled for the in and out octet values of an interface and a new "row" is added to the RRD file for that interface (every interface of every devices has its own RRD file). Cacti doesn't fill in the time "column" in the RRD file, the RRD file itself fills this in as the most recent round 5 minute interval (rounding down / backwards in time). Cacti supplies the octets sent and received bytes values which go in the next two "columns". After 15 minutes or so the RRD file now has 3 rows for example, each row has a timestamp and shows how much data (in terms of volume) has been sent to/received by that interface over a series of 5 minute internals. When Cacti draws a graphs it is actually RRDTool that draws the graphs; RRDTool is a program that can both store time series data and retrieve and manipulate it (produce an average/min/max values, multiply/divide them etc.), then draw a graph with the data after it has been manipulated. Cacti simply instructs RRDTool which interface (meaning which data source, which is an RRD file) to graph, a start/end time (which "rows" to read from that RRD file) and most importantly any data manipulation required (which RRDTool calls a "consolidation function") such as averaging the values between the start and end date specified or choosing only the higher values etc. A bunch of optional parameters can also be supplied on how to style the graph. GraphExport Accuracy and Consolidation: ====================================== This GraphExport plugin exists because there is a requirement for COMPANY to provide accurate usage of core links and device resources for on-time capacity planning. The graphs natively produced by Cacti and those produced by SolarWinds are not providing exact figures. The total number of octets sent and received by a given interface is polled and stored in 5 minute intervals in an RRD file. The exception to this is when the device is unreachable or not responding (its down/crashed/rebooting etc.). In these cases RRDTool can stores a "-nan" entry in the relevant in/out bytes column (which means "Not a Number"). When Cacti or GraphExport instruct RRDTool to produce a graph they must specify what to do with NaN values (replace them with zero "0" or reuse the last good value etc.). More on this later. When a user selects a custom time period within Cacti to produce a graph for interface usage from the first 5 minute sample of the calendar month to the last, e.g. 01/01/2017 00:00:00 to 31/01/2017 23:55:00, Cacti also tells RDDTool that the graph should be 500 pixels wide and 120 pixels high by default. There are 12 data samples per hour (xx:00:00 – xx:55:00), 288 samples per day, * 31 days for January == 8928 data samples for that calendar month. To draw a graph that shows all those data samples the graph would have to be at least 8928 pixels wide. This means that RRDTool is consolidating those 8928 samples down into just 500 samples to draw the graph Cacti requested. This means that not only is the visual area or line on a graph misleading, but also the stats written at in text at the bottom of a graph such as the min/max/average/current interface speed in and out are inaccurate. This consolidation happens before the min/max/avg/cur values are written on the bottom of the graph image or the graph is drawn so they are being based off of an average/subset of stats for the month. In addition to this consolidation of data which happens "live" and only affects the graph being drawn at that moment in time, not re-writing the data stored on disk, RRD files also support another stage of consolidation within the file itself which does affects how the data values are stored on disk. Storing 8928 values per month for bytes received and 8928 values per month for bytes out in modern times is a very low in storage requirement. However by default Cacti instructs RRDTool to only stores these values for 3 months, then for data older than 3 months is consolidated within the RRD file data down to half hour averages, then 1 hour averages for even older data, and so on the older the data is (so history data becomes less precise). If one requests from Cacti a calendar month graph from circa 2 years ago then RRDTool can't pull 8928 sample of data from the RRD file as they will have been reduce to say 1 data sample per hour which was an average for that hour, which it in turn then consolidates again to fit into the 500 pixel wide graph. The COMPANY instance of Cacti has all graph templates customised to store 5 minute samples for an interface for 2 years without consolidation (specifically, they will store up to a maximum of 2 years' worth of 5 minute samples, after two years the data doesn't become consolidated it is over written in a circular fashion). This means that the GraphExport tool can request RRDTool to make a graph for any time period within the last 2 years and it will provide the exact values from the start time to the end time, and those values will be used to write the text details of min/max/average and calculate the 95th percentile etc. This means that GraphTool provides very wide graphs to produce these precise stats. It is important to note that when generating a graph with GraphExport, if there is a NaN during the time period requested RRDTool will fill in that "gap" with the last known value. If there is a half an hour period when Cacti was unable to poll an interface for sent bytes for example, the entire half hour period will assume the last polled value. This should provide a "safe" capacity planning approach as it is unlikely the interface dropped to zero throughput during this period, and it is unlikely that it spiked to the peak rate for the month during this period too. The CSV files GraphExport produces include the NaN values, they are not replaced so that it can be seen when these have occurred. Now it is clear that GraphExport has access to 5 minute data samples for a 2 year period, and that missing samples are "smoothed over" (except in the CSV files). Next it is important to note that Cacti is polling the number of bytes sent and received for an interface, and not the interface speed. When creating a graph RRDTool calculates the required speed to transmit the delta of bytes between each time series sample. If the scenario occurs that an interface usage is low, the instantaneous interface speed is polled by Cacti, the interface usage becomes very high, then it drops low again, and is then polled again by Cacti on the next 5 minute polling cycle; polling the interface speed would have missed that burst. Polling the volume of data transmitted (octets sent and received) means that no bursts are missed. By working around both degrees of consolidation that RRDTool offers as historical way of saving disk space and recording bytes sent/received instead of interface speed, GraphExport produces the precise stats for a chosen data source.