[omd-users] Check_mk own check PEND - Cannot compute check result: Value overflow

Milan Jeskynka Kazatel KazatelM at seznam.cz
Tue Apr 7 13:43:26 CEST 2020


Hello Community,



can please someone check my own check for DNS Unbound for any issues? Why it
sometimes stops working? The whole goal is that it should, in any case, be 
able to compute variables. Which library can be involved to solve that 
message Value overflow described bellow? From Check_mk GUI it looks like 
that the service is stale.




Any hints?




Many thanks.




Server side:



#!/usr/bin/python

# -*- encoding: utf-8; py-indent-offset: 4 -*-




unbound_queries_default_levels = (1250,2000)




def inventory_unbound (info): 

    if len(info):

        return [("unbound", "unbound_queries_default_levels")] 




def check_unbound(item, params, info):

    warn, crit = params

    perfdata = []

    status = 0

    message = ""

    now = time.time()

    for line in info:

        name = line[0]

        value = int(line[1])

        menofunkce = "f_%s" % (name)

        rate = get_rate(menofunkce, now, value)

        perfdata.append(( name, rate ))

        if (name in "total_num_queries" and rate >= crit):

            status = 2

            message = "Crit - unexpected trafic total_num_queries %.2f /sec 
(warn/crit above %s/%s )" % (rate, warn, crit)

        elif (name in "total_num_queries" and rate >= warn):

            status = 1

            message = "Warn - increased trafic total_num_queries %.2f /sec 
(warn/crit above %s/%s )" % (rate, warn, crit)

        elif (name in "total_num_queries" and rate < warn):

            message = "total_num_queries %.2f /sec (warn/crit above %s/%s )"
% (rate, warn, crit)

    return(status, message, perfdata)







# declare the check to Check_MK

check_info['unbound'] = {

    "check_function"      : check_unbound,

    "inventory_function"  : inventory_unbound,

    "service_description" : '%s',

    "has_perfdata"        : True,

}





Agent side:


#!/bin/sh

if  command -v unbound-control > /dev/null 2>&1

then

  echo '<<<unbound>>>'

  unbound-control status > /dev/null 2>&1

  status=$?

  echo "status $status"

    if [ "$status" -eq 0 ]

    then

    unbound-control stats | sed 's/=/ /' | tr '.' '_' | grep -v "histogram\|
\time\|\total_requestlist"

    fi

fi





-- 
Smil Milan Jeskyňka Kazatel

---------- Původní e-mail ----------
Od: Milan Jeskynka Kazatel <KazatelM at seznam.cz>
Komu: Stan Brown <stanbrow at gmail.com>
Datum: 13. 3. 2020 16:56:16
Předmět: Re: Check_mk own check PEND - Cannot compute check result: Value 
overflow 
"Hi,



it seems to be a python limit - 2^63 - 1

I´m pretty sure, that the counter did not get that value, even if the get_
rate() does a subtraction of two metrics values, the serverside check is 
below, can someone look on the code a help me with a more sophisticated 
method, than just google it?
I´m not a programmer, then it is for me a try and burns method. Yes, the 
code is my own but is based on Check_mk published examples.
Maybe someone has more experience with troubleshooting methods in Check_mk 
and can share his approach. 

many thanks,
-- 
Smil Milan Jeskyňka Kazatel

---------- Původní e-mail ----------
Od: Stan Brown <stanbrow at gmail.com>
Komu: Milan Jeskynka Kazatel <KazatelM at seznam.cz>
Datum: 13. 3. 2020 15:09:06
Předmět: Re: Check_mk own check PEND - Cannot compute check result: Value 
overflow 
"
I am not the expert on this, but i believe this is all done with shell 
scripts. You should be able to Google the maximum value that can be store in
a shell integer, not certain what to do other than delete the check that is 
returning a value that is too big.



On Fri, Mar 13, 2020 at 8:20 AM Milan Jeskynka Kazatel <KazatelM at seznam.cz
(mailto:KazatelM at seznam.cz)> wrote:

"
Hello,



yes, very likely.

How to debug which variable is bigger or how to protect the check regarding 
unexpected results? I hoped, that the get_rate() should handle it.

Best regards, 
-- 
Smil Milan Jeskyňka Kazatel

---------- Původní e-mail ----------
Od: Stan Brown <stanbrow at gmail.com(mailto:stanbrow at gmail.com)>
Komu: Milan Jeskynka Kazatel <KazatelM at seznam.cz(mailto:KazatelM at seznam.cz)>
Datum: 13. 3. 2020 13:03:21
Předmět: Re: [omd-users] Check_mk own check PEND - Cannot compute check 
result: Value overflow 
"
Looks to me like some of your results are bigger than expected. total 
queries for instance.



On Fri, Mar 13, 2020 at 4:52 AM Milan Jeskynka Kazatel <KazatelM at seznam.cz
(mailto:KazatelM at seznam.cz)> wrote:

"
Hello,



I´m facing an unexpected behavior in my own check_mk check for DNS Unbound 
(cumulative statistic: yes) where seems to be the server-side check somehow 
broken.

It can be inventoried, it shows metrics, but sometimes it has shown a 
service stale status.




In command line is visible a message: unbound              PEND - Cannot 
compute check result: Value overflow
I´m not able to figure out what is wrong.




Could you please someone hint me for debug? The agent output is normal 
integer counters which are continuously increased.




Server side:



#!/usr/bin/python

# -*- encoding: utf-8; py-indent-offset: 4 -*-




unbound_queries_default_levels = (1250,2000)




def inventory_unbound (info): 

    if len(info):

        return [("unbound", "unbound_queries_default_levels")] 




def check_unbound(item, params, info):

    warn, crit = params

    perfdata = []

    status = 0

    message = ""

    now = time.time()

    for line in info:

        name = line[0]

        value = int(line[1])

        menofunkce = "f_%s" % (name)

        rate = get_rate(menofunkce, now, value)

        perfdata.append(( name, rate ))

        if (name in "total_num_queries" and rate >= crit):

            status = 2

            message = "Crit - unexpected trafic total_num_queries %.2f /sec 
(warn/crit above %s/%s )" % (rate, warn, crit)

        elif (name in "total_num_queries" and rate >= warn):

            status = 1

            message = "Warn - increased trafic total_num_queries %.2f /sec 
(warn/crit above %s/%s )" % (rate, warn, crit)

        elif (name in "total_num_queries" and rate < warn):

            message = "total_num_queries %.2f /sec (warn/crit above %s/%s )"
% (rate, warn, crit)

    return(status, message, perfdata)







# declare the check to Check_MK

check_info['unbound'] = {

    "check_function"      : check_unbound,

    "inventory_function"  : inventory_unbound,

    "service_description" : '%s',

    "has_perfdata"        : True,

}





Agent side:


#!/bin/sh

if  command -v unbound-control > /dev/null 2>&1

then

  echo '<<<unbound>>>'

  unbound-control status > /dev/null 2>&1

  status=$?

  echo "status $status"

    if [ "$status" -eq 0 ]

    then

    unbound-control stats | sed 's/=/ /' | tr '.' '_' | grep -v "histogram\|
\time\|\total_requestlist"

    fi

fi






Check_mk command line output:


OMD[dev]:~$ check_mk --debug -vv --checks=unbound DNSRVU

[cpu_tracking] Start with phase 'busy'

Check_MK version 1.5.0p23

+ FETCHING DATA

[cpu_tracking] Push phase 'agent' (Stack: ['busy'])

 [agent] No persisted sections loaded

 [agent] Not using cache (Don't try it)

 [agent] Execute data source

 [agent] Connecting via TCP to 172.50.1.3:6556(http://172.50.1.3:6556) (5.0s
timeout)

 [agent] Reading data from agent

 [agent] Write data to cache file /omd/sites/devel/tmp/check_mk/cache/DNSRVU

[cpu_tracking] Pop phase 'agent' (Stack: ['busy', 'agent'])

[cpu_tracking] Push phase 'agent' (Stack: ['busy'])

 [piggyback] No persisted sections loaded

 [piggyback] Execute data source

[cpu_tracking] Pop phase 'agent' (Stack: ['busy', 'agent'])

unbound              PEND - Cannot compute check result: Value overflow

[cpu_tracking] End

OK - [agent] Version: 1.4.0p31, OS: linux, execution time 0.7 sec | 
execution_time=0.745 user_time=0.020 system_time=0.010 children_user_time=
0.000 children_system_time=0.000 cmk_time_agent=0.715






Agent output:


<<<unbound>>>

status 0

thread0_num_queries 15588707

thread0_num_queries_ip_ratelimited 0

thread0_num_cachehits 15588703

thread0_num_cachemiss 4

thread0_num_prefetch 0

thread0_num_zero_ttl 0

thread0_num_recursivereplies 4

thread0_requestlist_avg 0

thread0_requestlist_max 0

thread0_requestlist_overwritten 0

thread0_requestlist_exceeded 0

thread0_requestlist_current_all 0

thread0_requestlist_current_user 0

thread0_tcpusage 0

thread1_num_queries 4625290

thread1_num_queries_ip_ratelimited 0

thread1_num_cachehits 4625270

thread1_num_cachemiss 20

thread1_num_prefetch 0

thread1_num_zero_ttl 0

thread1_num_recursivereplies 20

thread1_requestlist_avg 0

thread1_requestlist_max 0

thread1_requestlist_overwritten 0

thread1_requestlist_exceeded 0

thread1_requestlist_current_all 0

thread1_requestlist_current_user 0

thread1_tcpusage 0

thread2_num_queries 1719352

thread2_num_queries_ip_ratelimited 0

thread2_num_cachehits 1719344

thread2_num_cachemiss 8

thread2_num_prefetch 0

thread2_num_zero_ttl 0

thread2_num_recursivereplies 8

thread2_requestlist_avg 0

thread2_requestlist_max 0

thread2_requestlist_overwritten 0

thread2_requestlist_exceeded 0

thread2_requestlist_current_all 0

thread2_requestlist_current_user 0

thread2_tcpusage 0

thread3_num_queries 15583658

thread3_num_queries_ip_ratelimited 0

thread3_num_cachehits 15583658

thread3_num_cachemiss 0

thread3_num_prefetch 0

thread3_num_zero_ttl 0

thread3_num_recursivereplies 0

thread3_requestlist_avg 0

thread3_requestlist_max 0

thread3_requestlist_overwritten 0

thread3_requestlist_exceeded 0

thread3_requestlist_current_all 0

thread3_requestlist_current_user 0

thread3_tcpusage 0

total_num_queries 37517007

total_num_queries_ip_ratelimited 0

total_num_cachehits 37516975

total_num_cachemiss 32

total_num_prefetch 0

total_num_zero_ttl 0

total_num_recursivereplies 32

total_tcpusage 0

mem_cache_rrset 66072

mem_cache_message 66289

mem_mod_iterator 16588

mem_mod_validator 66352

mem_mod_respip 0

mem_streamwait 0

num_query_type_A 37516990

num_query_type_PTR 1

num_query_type_AAAA 16

num_query_class_IN 37517007

num_query_opcode_QUERY 37517007

num_query_tcp 0

num_query_tcpout 0

num_query_tls 0

num_query_tls_resume 0

num_query_ipv6 36356

num_query_flags_QR 0

num_query_flags_AA 0

num_query_flags_TC 0

num_query_flags_RD 37517007

num_query_flags_RA 0

num_query_flags_Z 0

num_query_flags_AD 108911

num_query_flags_CD 0

num_query_edns_present 108911

num_query_edns_DO 0

num_answer_rcode_NOERROR 37516974

num_answer_rcode_FORMERR 0

num_answer_rcode_SERVFAIL 32

num_answer_rcode_NXDOMAIN 1

num_answer_rcode_NOTIMPL 0

num_answer_rcode_REFUSED 0

num_query_ratelimited 0

num_answer_secure 0

num_answer_bogus 0

num_rrset_bogus 0

num_query_aggressive_NOERROR 0

num_query_aggressive_NXDOMAIN 0

unwanted_queries 0

unwanted_replies 0

msg_cache_count 1

rrset_cache_count 0

infra_cache_count 26

key_cache_count 0

num_query_authzone_up 0

num_query_authzone_down 0











regards, 
-- 
Smil Milan Jeskyňka Kazatel


_______________________________________________
omd-users mailing list
omd-users at lists.mathias-kettner.de
(mailto:omd-users at lists.mathias-kettner.de)
Manage your subscription or unsubscribe
https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/omd-users
(https://lists.mathias-kettner.de/cgi-bin/mailman/listinfo/omd-users)
"




-- 





UNIX is basically a simple operating system, but you have to be a genius to 
understand the simplicity.

Dennis Ritchie(https://www.brainyquote.com/authors/dennis-ritchie-quotes)




"

"




-- 





UNIX is basically a simple operating system, but you have to be a genius to 
understand the simplicity.

Dennis Ritchie(https://www.brainyquote.com/authors/dennis-ritchie-quotes)




"
"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mathias-kettner.de/pipermail/omd-users/attachments/20200407/fa310e41/attachment-0001.html>


More information about the omd-users mailing list