[Check_mk (english)] Check crash : netapp_api_vs_traffic

Kyâne PICHOU kyane.pichou at etu.utc.fr
Wed Jul 6 09:03:38 CEST 2016


Hi,

Thanks for your help. I reached the same conclusion and the fix seems to 
work for now.
I'll see if I can get the few missing counters on my cluster, maybe it 
is just a key name issue.

Regards.

Kyâne PICHOU

Jam Mulch:
> I went ahead and submitted this (along with the workaround) as a bug to
> feedback at check-mk.org.
>
> On 07/05/2016 01:54 PM, Jam Mulch wrote:
>> Here's what i get when I do it:
>>
>> Traceback (most recent call last):
>>   File "/omd/sites/foo/share/check_mk/modules/check_mk.py", line 5288,
>> in <module>
>>     exit_status = do_check(hostname, ipaddress, check_types)
>>   File "/omd/sites/foo/share/check_mk/modules/check_mk_base.py", line
>> 1206, in do_check
>>     do_all_checks_on_host(hostname, ipaddress, only_check_types)
>>   File "/omd/sites/foo/share/check_mk/modules/check_mk_base.py", line
>> 1468, in do_all_checks_on_host
>>     result = sanitize_check_result(check_function(item, params, info),
>> check_uses_snmp(checkname))
>>   File "/omd/sites/foo/share/check_mk/modules/check_mk_base.py", line
>> 1775, in sanitize_check_result
>>     return sanitize_yield_check_result(result, is_snmp)
>>   File "/omd/sites/foo/share/check_mk/modules/check_mk_base.py", line
>> 1781, in sanitize_yield_check_result
>>     subresults = list(result)
>>   File "/omd/sites/foo/share/check_mk/checks/netapp_api_vs_traffic",
>> line 87, in check_netapp_api_vs_traffic
>>     rate = get_rate(what, now, int(data[what]) * scale)
>> KeyError: 'nfsv4_read_ops'
>> OMD[oitprod2r]:~$
>>
>>
>> I believe this means that the (unspecified) element being checked is
>> missing the nfsv4_read_ops data.
>> I would suggest adding exception handling to the get_rate function to
>> return 0 for 'KeyError exceptions'.
>>
>> From:
>>         for what, perfname, perftext, scale, format_func in values:
>>             rate = get_rate(what, now, int(data[what]) * scale)
>>             yield 0, "%s %s: %s" % (protoname, perftext,
>> format_func(rate)), [(perfname, rate)]
>>
>> To:
>>
>>         for what, perfname, perftext, scale, format_func in values:
>>             try:
>>                rate = get_rate(what, now, int(data[what]) * scale)
>>             except KeyError:
>>                rate = 0
>>             yield 0, "%s %s: %s" % (protoname, perftext,
>> format_func(rate)), [(perfname, rate)]
>>
>>
>>
>> On 07/05/2016 01:29 PM, Marcel Schulte wrote:
>>>
>>> Hi Kyâne,
>>>
>>> Your can also check the output of "cmk --debug -npvvvv --checks
>>> netapp_api_vs_traffic AFFECTEDHOSTNAME", maybe that sheds some light
>>> on the issue...
>>>
>>> Regards,
>>> Marcel
>>>
>>>
>>> Jam Mulch <spammagnet10 at gmail.com <mailto:spammagnet10 at gmail.com>>
>>> schrieb am Di., 5. Juli 2016 19:10:
>>>
>>>     As I vaguely remember it...the netapp_api_vs_traffic check looks for
>>>     several pieces of data and if any
>>>     one is missing, the check crashes. (sent_errors recv_errors,
>>>     cifs_latency, afg_write_latency, etc....)
>>>
>>>     I added exceptions back around 1.2.8p1 to return a 0 result for any
>>>     results that depended on missing data,
>>>     but in later versions I just Acknowledged any crashed Traffic
>>>     checks (5
>>>     or so on one of my clusters,
>>>     and 2 on another).
>>>
>>>     To verify that this is your problem, look at the raw data from
>>>     agent_netapp and see if the volumes that
>>>     are crashing are missing one or more pieces of data that the ones
>>>     which
>>>     work are not missing.
>>>
>>>     Here is what 1.2.8p5 expects for the various types:
>>>
>>>     def check_netapp_api_vs_traffic(item, _no_params, parsed):
>>>          protocol_map = {
>>>              "lif:vserver": ("Ethernet",
>>>                   # ( what                 perfname perftext
>>>     scale     format_func)
>>>                  [  ("recv_data",          "if_in_octets", "received
>>>     data",      1,        get_bytes_human_readable),
>>>                     ("sent_data",          "if_out_octets",   "sent
>>>     data",          1,        get_bytes_human_readable),
>>>                     ("recv_errors",        "if_in_errors", "received
>>>     errors",    1,        int),
>>>                     ("sent_errors",        "if_out_errors",   "sent
>>>     errors",        1,        int),
>>>                     ("recv_packet",        "if_in_pkts", "received
>>>     packets",   1,        int),
>>>                     ("sent_packet",        "if_out_pkts",   "sent
>>>     packets",       1,        int)]),
>>>
>>>              "fcp_lif:vserver": ("FCP",
>>>                  [  ("avg_read_latency",   "fcp_read_latency",    "avg.
>>>     Read latency",  0.001,    lambda x: "%.2f ms" % (x * 1000)),
>>>                     ("avg_write_latency",  "fcp_write_latency",   "avg.
>>>     Write latency", 0.001,    lambda x: "%.2f ms" % (x * 1000)),
>>>                     ("read_data",          "fcp_read_data",   "read
>>>     data",          1,        get_bytes_human_readable),
>>>                     ("write_data",         "fcp_write_data",    "write
>>>     data",         1,        get_bytes_human_readable)]),
>>>
>>>              "cifs:vserver": ("CIFS",
>>>                  [  ("cifs_read_latency",  "cifs_read_latency",   "read
>>>     latency",       0.000000001, lambda x: "%.2f ms" % (x * 1000)),
>>>                     ("cifs_write_latency", "cifs_write_latency",   "write
>>>     latency",      0.000000001, lambda x: "%.2f ms" % (x * 1000)),
>>>                     ("cifs_read_ops",      "cifs_read_ios",   "read
>>>     OPs",           1,        int),
>>>                     ("cifs_write_ops",     "cifs_write_ios",    "write
>>>     OPs",          1,        int)]),
>>>
>>>              "iscsi_lif:vserver": ("iSCSI",
>>>                  [  ("avg_read_latency",  "iscsi_read_latency",   "avg.
>>>     Read latency",  0.001,    lambda x: "%.2f ms" % (x * 1000)),
>>>                     ("avg_write_latency", "iscsi_write_latency",  "avg.
>>>     Write latency", 0.001,    lambda x: "%.2f ms" % (x * 1000)),
>>>                     ("read_data",          "iscsi_read_data",   "read
>>>     data",          1,        get_bytes_human_readable),
>>>                     ("write_data",         "iscsi_write_data",    "write
>>>     data",         1,        get_bytes_human_readable)]),
>>>
>>>              "nfsv3": ("NFS",
>>>                  [  ("nfsv3_read_ops",     "nfs_read_ios",    "read
>>>     OPs",           1,        int),
>>>                     ("nfsv3_write_ops",    "nfs_write_ios",   "write
>>>     OPs",          1,        int)]),
>>>
>>>              "nfsv4": ("NFSv4",
>>>                  [  ("nfsv4_read_ops",     "nfsv4_read_ios",    "read
>>>     OPs",           1,        int),
>>>                     ("nfsv4_write_ops",    "nfsv4_write_ios",   "write
>>>     OPs",          1,        int)]),
>>>
>>>              "nfsv4_1": ("NFSv4.1",
>>>                  [  ("nfsv4_1_ops",        "nfsv4_1_ios",
>>>     "OPs",                1,        int) ])
>>>          }
>>>          vserver = item.split(" ", 3)
>>>
>>>          now = time.time()
>>>          for protocol, (protoname, values) in protocol_map.items():
>>>              data = parsed.get("%s.%s" % (protocol, item))
>>>              if not data:
>>>                  continue
>>>              for what, perfname, perftext, scale, format_func in values:
>>>                  rate = get_rate(what, now, int(data[what]) * scale)
>>>                  yield 0, "%s %s: %s" % (protoname, perftext,
>>>     format_func(rate)), [(perfname, rate)]
>>>
>>>
>>>     On 07/05/2016 12:34 PM, Kyâne PICHOU wrote:
>>>     > Hello,
>>>     >
>>>     > I use Check_MK raw 1.2.8p5 and I have an issue with the
>>>     > netapp_api_vs_traffic check.
>>>     >
>>>     > For two vservers (using nfsv4) I have an Unknown result and the
>>>     > message "UNKNOWN - check failed - please submit a crash
>>>     report!". And
>>>     > I also have a "No crash dump is available for this service."
>>>     > I check and it seems that the agent return valid data for those
>>>     > vserver, but the check crash.
>>>     >
>>>     > I don't know why and I don't have a crash report. What can I do
>>>     ? Is
>>>     > there a way to fix the "no crash dump" thing ?
>>>     >
>>>     > Regards
>>>     >
>>>
>>>     _______________________________________________
>>>     checkmk-en mailing list
>>>     checkmk-en at lists.mathias-kettner.de
>>>     <mailto:checkmk-en at lists.mathias-kettner.de>
>>>     http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
>>>
>>
>


More information about the checkmk-en mailing list