[Check_mk (english)] Livestatus query hangs icinga after restart

Mathias Kettner mk at mathias-kettner.de
Wed Aug 17 12:12:19 CEST 2011


Hi Ewold,

Am 16.08.2011 17:35, schrieb Ewold:
> This fixes the problem on the timeperiods, any idea how can i debug the
Great! This was not easy to track.

The problem with the Views still takes a couple of tries. Please watch 
out for mails on this list. I'll try to make a new GIT version with
more verbose debug output on that topic.

Mathias

> views problem ?
>
> I did a restart / reload with a mulisite reload right after the startup
> and it kept working this time.
>
>
>
> [1313508607] Caught SIGTERM, shutting down...
> [1313508607] Successfully shutdown... (PID=28329)
> [1313508607] livestatus: deinitializing
> [1313508607] livestatus: Waiting for main to terminate...
> [1313508608] livestatus: Socket thread has terminated
> [1313508608] livestatus: Waiting for client threads to terminate...
> [1313508608] Event broker module '/usr/lib/check_mk/livestatus.o'
> deinitialized successfully.
> [1313508608] npcdmod: If you don't like me, I will go out! Bye.
> [1313508608] Event broker module '/usr/local/pnp4nagios/lib/npcdmod.o'
> deinitialized successfully.
> [1313508608] Icinga 1.4.2 starting... (PID=29779)
> [1313508608] Local time is Tue Aug 16 17:30:08 CEST 2011
> [1313508608] LOG VERSION: 2.0
> [1313508608] livestatus: Livestatus 1.1.11i2 by Mathias Kettner. Socket:
> '/var/icinga/rw/live'
> [1313508608] livestatus: Please visit us at http://mathias-kettner.de/
> [1313508608] livestatus: Hint: please try out OMD - the Open Monitoring
> Distribution
> [1313508608] livestatus: Please visit OMD at http://omdistro.org
> [1313508608] Event broker module '/usr/lib/check_mk/livestatus.o'
> initialized successfully.
> [1313508608] npcdmod: Copyright (c) 2008-2009 Hendrik Baecker
> (andurin at process-zero.de <mailto:andurin at process-zero.de>) -
> http://www.pnp4nagios.org
> [1313508608] npcdmod: /usr/local/pnp4nagios/etc/npcd.cfg initialized
> [1313508608] npcdmod: spool_dir = '/usr/local/pnp4nagios/var/spool/'.
> [1313508608] npcdmod: perfdata file
> '/usr/local/pnp4nagios/var/perfdata.dump'.
> [1313508608] npcdmod: Ready to run to have some fun!
>
> [1313508608] livestatus: Timeperiod cache not updated, there are no
> timeperiods (yet)
>
> [1313508608] Event broker module '/usr/local/pnp4nagios/lib/npcdmod.o'
> initialized successfully.
> [1313508608] Finished daemonizing... (New PID=29783)
>
> [1313508612] TIMEPERIOD TRANSITION: 24X7;-1;1
> [1313508612] TIMEPERIOD TRANSITION: 24x7;-1;1
> [1313508612] TIMEPERIOD TRANSITION: 24x7_sans_holidays;-1;1
> [1313508612] TIMEPERIOD TRANSITION: none;-1;0
> [1313508612] TIMEPERIOD TRANSITION: us-holidays;-1;0
> [1313508612] TIMEPERIOD TRANSITION: workhours;-1;0
>
> [1313508612] livestatus: Going to open socket and starting threads
>
>
> On Tue, Aug 16, 2011 at 4:56 PM, Mathias Kettner <mk at mathias-kettner.de
> <mailto:mk at mathias-kettner.de>> wrote:
>
>     Hi Ewold,
>
>     I've made another change that *might* fix the problem. Could
>     you check again and send the logfile output?
>
>     Mathias
>
>
>     Am 16.08.2011 16:37, schrieb Mathias Kettner:
>
>         Hi Ewold,
>
>         In the log file I can see one problem, at least: The initial
>         timeperiod
>         information is created *after* the socket has been opened. This
>         means,
>         that Icinga creates the timeperiod definitions not before entering
>         the main program loop. I would go so far as to think of that as
>         a bug.
>
>         Do you have a chance to check this out with a Nagios kernel instead?
>
>         Mathias
>
>         Am 16.08.2011 16:10, schrieb Ewold:
>
>             This patch didn't work.
>
>
>             [1313502076] livestatus: Going to open socket and starting
>             threads
>             [1313502079] livestatus: No timeperiod information available
>             for 24X7.
>             Assuming out of period.
>
>             After this the icinga process still hangs.
>
>             After another restart without any livestatus request its
>             back to the
>             behaviour as before.
>
>             [1313502157] livestatus: Going to open socket and starting
>             threads
>             [1313502180] TIMEPERIOD TRANSITION: 24X7;-1;1
>             [1313502180] TIMEPERIOD TRANSITION: 24x7;-1;1
>             [1313502180] TIMEPERIOD TRANSITION: 24x7_sans_holidays;-1;1
>             [1313502180] TIMEPERIOD TRANSITION: none;-1;0
>             [1313502180] TIMEPERIOD TRANSITION: us-holidays;-1;0
>             [1313502180] TIMEPERIOD TRANSITION: workhours;-1;1
>
>
>             On the other side : We have now some views, the one time the
>             view gets
>             thru , the other time it doesn't and displays a message :
>             Missing the
>             variable view_name in the URL.
>             I you refresh a couple of times the page gets displayed,
>             refresh another
>             time its back the error message.
>             I ran a debug of the internal error page when opening a view
>             and this
>             gave this output.
>
>             Internal error:: No view name and not datasource defined.
>
>             Traceback (most recent call last):
>             File"/usr/share/check_mk/web/__htdocs/index.py", line 236,
>             in handler
>             handler()
>             File"/usr/share/check_mk/web/__htdocs/views.py", line 477, in
>             page_edit_view
>             raise MKInternalError("No view name and not datasource
>             defined.")
>             MKInternalError: No view name and not datasource defined.
>
>             For the moment i revert back to 1.10p3 which doesn't have
>             this problem.
>
>             I will be glad to test things if i can do anything to sort this
>             problems out.
>
>             Python is version : Python 2.6.5
>
>
>             On Tue, Aug 16, 2011 at 1:25 PM, Mathias Kettner
>             <mk at mathias-kettner.de <mailto:mk at mathias-kettner.de>
>             <mailto:mk at mathias-kettner.de
>             <mailto:mk at mathias-kettner.de>>__> wrote:
>
>             Hi Ewold and other who experiance that problem,
>
>             this seems to be a very delicate timing problem withing the
>             Nagios and Icinga kernels and Icinga seems to be more
>             unstable here.
>
>             I've now made two changes in the GIT version in order to
>             tackle down
>             that problem:
>
>             - In case of that error situation, I do not determine the
>             current
>             state of the TP any longer but simply assume "out of period".
>             This avoids dangerous non-thread-safe code in Nagios/Icinga.
>             Your views in Multisite might - for a very short period of
>             time -
>             not show all items, depending on your filter settings.
>
>             - I added a log message appearing at the point of time where the
>             Livestatus socket is being created.
>
>             In a normal operation the log file should look like this:
>
>             [1313493641] Event broker module
>             '/omd/sites/gag/lib/mk-____livestatus/livestatus.o' initialized
>             successfully.
>             [1313493641] TIMEPERIOD TRANSITION: 24X7;-1;1
>             [1313493641] TIMEPERIOD TRANSITION: 24x7;-1;1
>             [1313493641] TIMEPERIOD TRANSITION: none;-1;0
>             [1313493641] TIMEPERIOD TRANSITION: workhours;-1;1
>             [1313493641] livestatus: Going to open socket and starting
>             threads
>
>             Important: *before* the socket is being opened, *all*
>             timeperiods
>             must have there "TRANSITION" log entry.
>
>             Please check this out und tell me, if you can get any further
>             information about the problem.
>
>             Mathias
>
>
>             Am 15.08.2011 23:47, schrieb Ewold:
>
>             When we restart, on our nagios centos 5.6 server we see usually
>             alot of
>             messages : livestatus: No timeperiod information available
>             for 24X7
>             After the timeperiods are loaded this goes away. But the server
>             keeps
>             running
>
>             However on another server with icinga on centos 6.0 after a
>             restart this
>             messages appears just one time : livestatus: No timeperiod
>             information
>             available for 24X7
>             Afterwards icinga dies. No logs afterwards are written, no
>             checks are
>             done. I just hangs the icinga process.
>             Only way to avoid this that no livestatus requests come in, so
>             we must
>             close all check_mk pages in the browser.
>              From the moment the timeperiods are loaded the livestatus
>             keeps working
>             without errors.
>             This problem also appears when the logs are flushed.
>
>             Anyone else have seen this behaviour ?
>             Its pretty annoying to stop the apache just for a icinga
>             restart.
>
>
>
>             ___________________________________________________
>             checkmk-en mailing list
>             checkmk-en at lists.mathias-____kettner.de
>             <mailto:checkmk-en at lists.mathias-__kettner.de>
>             <mailto:checkmk-en at lists.__mathias-kettner.de
>             <mailto:checkmk-en at lists.mathias-kettner.de>>
>             http://lists.mathias-kettner.____de/mailman/listinfo/checkmk-__en
>             <http://lists.mathias-kettner.__de/mailman/listinfo/checkmk-en
>             <http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en>__>
>
>
>             ___________________________________________________
>             checkmk-en mailing list
>             checkmk-en at lists.mathias-____kettner.de
>             <mailto:checkmk-en at lists.mathias-__kettner.de>
>             <mailto:checkmk-en at lists.__mathias-kettner.de
>             <mailto:checkmk-en at lists.mathias-kettner.de>>
>             http://lists.mathias-kettner.____de/mailman/listinfo/checkmk-__en
>             <http://lists.mathias-kettner.__de/mailman/listinfo/checkmk-en
>             <http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en>__>
>
>
>
>
>             _________________________________________________
>             checkmk-en mailing list
>             checkmk-en at lists.mathias-__kettner.de
>             <mailto:checkmk-en at lists.mathias-kettner.de>
>             http://lists.mathias-kettner.__de/mailman/listinfo/checkmk-en <http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en>
>
>
>         _________________________________________________
>         checkmk-en mailing list
>         checkmk-en at lists.mathias-__kettner.de
>         <mailto:checkmk-en at lists.mathias-kettner.de>
>         http://lists.mathias-kettner.__de/mailman/listinfo/checkmk-en
>         <http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en>
>
>
>     _________________________________________________
>     checkmk-en mailing list
>     checkmk-en at lists.mathias-__kettner.de
>     <mailto:checkmk-en at lists.mathias-kettner.de>
>     http://lists.mathias-kettner.__de/mailman/listinfo/checkmk-en
>     <http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en>
>
>
>
>
> _______________________________________________
> checkmk-en mailing list
> checkmk-en at lists.mathias-kettner.de
> http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en



More information about the checkmk-en mailing list