[Check_mk (english)] External commands cause livestatus to crash Nagios4

Eron Nicholson eron at basecamp.com
Tue Feb 10 22:36:22 CET 2015


We have been having issues with Livestatus causing our Nagios4
instances to crash.  I have managed to pin down the cause of the
crashes and replicate it using a simple script.  We have lots of
clients sending in passive check results.  Many of these clients are
naive about the nagios server's config and will send in passive
results for checks that do not exist.  It seems that when livestatus
receives enough invalid check commands, it will crash.

Here is the script :
https://gist.githubusercontent.com/enichols/e2c8c1615e10bbf4881c/raw/f1b11ab6ff6c56f50aeda400b11f08aa6d231190/gistfile1.txt

Here are some log excerpts of a crash :

nagios.log
-------------
[1423603111] Warning:  Passive check result was received for service
'Invalid Service Name 1' on host 'jobs-101', but the service could not
be found!
[1423603111] Error: External command failed ->
PROCESS_SERVICE_CHECK_RESULT;jobs-101;Invalid Service Name 1;1;message
[1423603111] Warning:  Passive check result was received for service
'Invalid Service Name 2' on host 'jobs-101', but the service could not
be found!
[1423603111] Error: External command failed ->
PROCESS_SERVICE_CHECK_RESULT;jobs-101;Invalid Service Name 2;1;message
[1423603111] Warning:  Passive check result was received for service
'Invalid Service Name 5' on host 'jobs-101', but the service could not
be found!
[1423603111] Error: External command failed ->
PROCESS_SERVICE_CHECK_RESULT;jobs-101;Invalid Service Name 5;1;message
[1423603111] Warning:  Passive check result was received for service
'Invalid Service Name 0' on host 'jobs-101', but the service could not
be found!
[1423603111] Error: External command failed ->
PROCESS_SERVICE_CHECK_RESULT;jobs-101;Invalid Service Name 0;1;message

livestatus.log (with debug=1)
-----------------
2015-02-10 21:18:31 Query: COMMAND [1423603111]
PROCESS_SERVICE_CHECK_RESULT;jobs-101;Invalid Service Name 9;1;message
2015-02-10 21:18:31 Query: COMMAND [1423603111]
PROCESS_SERVICE_CHECK_RESULT;jobs-101;Invalid Service Name 3;1;message
2015-02-10 21:18:31 Query: COMMAND [1423603111]
PROCESS_SERVICE_CHECK_RESULT;jobs-101;Invalid Service Name 1;1;message
2015-02-10 21:18:31 Query: COMMAND [1423603111]
PROCESS_SERVICE_CHECK_RESULT;jobs-101;Invalid Service Name 2;1;message
2015-02-10 21:18:31 Query: COMMAND [1423603111]
PROCESS_SERVICE_CHECK_RESULT;jobs-101;Invalid Service Name 5;1;message
2015-02-10 21:18:31 Query: COMMAND [1423603111]
PROCESS_SERVICE_CHECK_RESULT;jobs-101;Invalid Service Name 0;1;message

We are running nagios 4.0.8 and livestatus 1.2.5i5p4.  We have tried
several different livestatus versions and had the same behavior.

This is a serious issue for us since it takes down the main nagios
process.  For now, we have had to send our external commands to a
golang listener that appends to status.dat directly.

Let me know if you require additional debugging info, but you should
be able to replicate fairly easily with the script linked above.

Thanks,

Eron Nicholson
System Administrator
Basecamp


More information about the checkmk-en mailing list