[omd-users] var/rrdcached growing

Alexander Rusa alexander.rusa at emerion.com
Tue Oct 23 10:13:19 CEST 2012


Hallo,

ich habe heute morgen entdeckt, dass eher keine Timeout-errors mehr auftreten und alles OK aussieht im perfdata.log.

Aber was ich nicht und nicht verstehe ist warum die rrdcache-journal-Daten immer mehr werden und welcher Prozess mit diesen Daten eigentlich was genau machen sollte!
kann mir bitte jemand helfen das zu verstehen?

Mir kommt vor dieser Part fehlt irgendwie in der Grafik auf http://omdistro.org/wiki/omd/Pnp4nagios

Ich habe jetzt schon über 160 Dateien mit insgesamt über 11GB in omd/sites/.../var/rrdcached/rrd.journal.* und es werden scheinbar nicht weniger.

LG Alex

Am 22.10.2012 um 16:42 schrieb Alexander Rusa <alexander.rusa at emerion.com>:

> Hi,
> 
> My /opt/omd/sites/.../var/rrdcached directory is growing very fast.
> At the moment it contains 151 files with a total of ~9GB.
> Currently I am running version 0.56.
> It looks like this problem exists since upgrading to 0.52.
> 
> Last week I tried to find the source of the problem and ended up deleting everything inside var/pnp4nagios/perfdata/ because I found out that there were some problems because the RRD_STORAGE_TYPE was changed to MULTIPLE and after spending some hours in trying to convert the old rrd-files I gave up and deleted the whole performance-data-history.
> 
> Now the Disk space is again critical and I have no idea what the problem could be!
> 
> We are monitoring about 4000 Services.
> 
> The var/pnp4nagios/log/perfdata.log shows nothing but timeouts:
> 
> #####
> ...
> 2012-10-22 16:25:29 [20877] [1] process_perfdata.pl-0.6.19 starting in BULK Mode called by NPCD
> 2012-10-22 16:25:29 [20877] [1] Found Performance Data for server1 / _HOST_ (rta=0.241ms;200.000;500.000;0; pl=0%;40;80;; rtmax=0.298ms;;;; rtmin=0.198ms;;;;) 
> 2012-10-22 16:25:29 [20879] [1] process_perfdata.pl-0.6.19 starting in BULK Mode called by NPCD
> 2012-10-22 16:25:29 [20879] [1] Found Performance Data for server2 / CPU_load (load1=8.13;20;40;0; load5=8.8;20;40;0; load15=9.12;20;40;0;) 
> 2012-10-22 16:25:44 [20877] [0] *** TIMEOUT: Timeout after 15 secs. ***
> 2012-10-22 16:25:44 [20877] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
> 2012-10-22 16:25:44 [20877] [0] *** TIMEOUT: Please check your process_perfdata.cfg
> 2012-10-22 16:25:44 [20877] [0] *** TIMEOUT: /omd/sites/emerion/var/pnp4nagios/spool//perfdata.1350915913-PID-20877 deleted
> 2012-10-22 16:25:44 [20877] [0] *** Timeout while processing Host: "server1" Service: "_HOST_"
> 2012-10-22 16:25:44 [20877] [0] *** process_perfdata.pl terminated on signal ALRM
> ...
> #####
> 
> Can anyone tell me where I could find the root for the problem?
> 
> One thing I know is, that the server sometimes has a very high load and we are planing to move some services away from this machine, but even when I stop some resource-eating services only timeouts are showing up in the perfdata.log
> 
> Best regards,
> 
> Alex
> _______________________________________________
> omd-users mailing list
> omd-users at lists.mathias-kettner.de
> http://lists.mathias-kettner.de/mailman/listinfo/omd-users



More information about the omd-users mailing list