[Check_mk (english)] Problem with recovery events and Event Console

Kojo Lassi / JMJping Oy lassi.kojo at jmjping.fi
Thu Jul 7 07:51:54 CEST 2016


So it seems that this is not purely related to recovery notifications because I received some last night.
Now it seems that I'm not getting all the notifications whatever the state is.

I see this in notify.log:
2016-07-07 08:37:02 Got raw notification (MYHOST;HOST) context with 43 variables
2016-07-07 08:37:02 Global rule 'SMS'...
2016-07-07 08:37:02  -> does not match: EC Event has rule ID 'alarm', but '['sms_temp_application', 'sms_temp_output']' is required
2016-07-07 08:37:02 Global rule 'Notify all contacts of a host/service via HTML email'...
2016-07-07 08:37:02  -> matches!
2016-07-07 08:37:02 1 rules matched, but no notification has been created.

Why does this happen?
It says 1 rule matched but no notification created.

Any hints how to debug this further are welcome.

- Lassi

Lähettäjä: checkmk-en-bounces at lists.mathias-kettner.de [mailto:checkmk-en-bounces at lists.mathias-kettner.de] Puolesta Kojo Lassi / JMJping Oy
Lähetetty: keskiviikkona 6. heinäkuuta 2016 13.54
Vastaanottaja: checkmk-en at lists.mathias-kettner.de
Aihe: [Check_mk (english)] Problem with recovery events and Event Console

Hi,

I have been setting up a distributed monitoring system with CRE but I just before I was going to roll it into production, I noticed a problem.

First of all, I would like to ask a second opinion from the list about the architecture. Does the following sound sane?

The setup consists of a master site and fistful of slaves, which receives the configuration from the master site.
I want notifications to be sent out only from the master site, so I configured "Send notifications to Event Console" to the master and "Send notifications to remote Event Console" for every slave and it the delivery from slaves to master works just fine.
Then I have rules in master site's Event Console, which defines the events which should generate a notification and then notification rules which defines who gets notifications and how.

At first I wasn't sure about using Event Console for forwarding notifications from slaves to master, but I think mknotifyd was moved to CEE so this seemed like a good option (I have no problem with subscribing for CEE, but I'm not sure if it would even solve this problem).

I'm running Ubuntu 16.04 an all hosts with newest CRE (1.2.8p5).

Now the problem is, that I don't receive any recovery messages at all. Only WARNs and CRITs are coming through. I noticed that notify.log says this:
This is the WARN notification:

2016-07-06 13:14:38 Got raw notification (MYHOST;Job postgre-backup) context with 52 variables
2016-07-06 13:14:38 Got raw notification (MYHOST;Job postgre-backup) context with 43 variables
2016-07-06 13:14:38 Global rule 'SMS'...
2016-07-06 13:14:38 Global rule 'SMS'...
2016-07-06 13:14:38  -> does not match: Notification has not been created by the Event Console.
2016-07-06 13:14:38 Global rule 'Notify all contacts of a host/service via HTML email'...
2016-07-06 13:14:38  -> does not match: Notification has not been created by the Event Console.
2016-07-06 13:14:38  -> does not match: EC Event has rule ID 'alarm', but '['sms_temp_application', 'sms_temp_output']' is required
2016-07-06 13:14:38 Global rule 'Notify all contacts of a host/service via HTML email'...
2016-07-06 13:14:38  -> matches!
2016-07-06 13:14:38 Warning: cannot get information about contact mkeventd: ignoring restrictions
2016-07-06 13:14:38    - adding notification of mkeventd via mail
2016-07-06 13:14:38    - adding notification of MYUSER via mail
2016-07-06 13:14:38 Executing 3 notifications:
2016-07-06 13:14:40   * notifying MYUSER via mail, parameters: (no parameters), bulk: no
2016-07-06 13:14:40      executing /omd/sites/jmjping/share/check_mk/notifications/mail
2016-07-06 13:14:41   * notifying mkeventd via mail, parameters: (no parameters), bulk: no
2016-07-06 13:14:41      executing /omd/sites/MYSITE/share/check_mk/notifications/mail
2016-07-06 13:14:41      Output: Cannot send HTML email: empty destination email address
2016-07-06 13:14:41      Plugin exited with code 2

And I get the email just fine, but then the recovery looks like this:

2016-07-06 13:16:27 Got raw notification (MYHOST;Job postgre-backup) context with 52 variables
2016-07-06 13:16:27 Global rule 'SMS'...
2016-07-06 13:16:27  -> does not match: Notification has not been created by the Event Console.
2016-07-06 13:16:27 Global rule 'Notify all contacts of a host/service via HTML email'...
2016-07-06 13:16:27  -> does not match: Notification has not been created by the Event Console.

So the recovery event is not generated by EC, but it comes from the Nagios core. Why does this happen, if WARNs and CRITs are generated by EC (or at least they are delivered correctly)?

I tried manually to generate recovery event with Event Simulator and it hits my rule correctly, but the fact that it is not generated by EC seems to be a problem.

- Lassi Kojo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mathias-kettner.de/pipermail/checkmk-en/attachments/20160707/8105b32d/attachment-0001.html>


More information about the checkmk-en mailing list