[Date Prev]   [Date Next] [Thread Prev]   [Thread Next] [Date Index]   [Thread Index]


     Re: [snips-users] hostmon: "no more, no less"

> Following up no responses to my query last week, how difficult do you 
> think it would be to modify the hostmon code to see a config record 
> like this:
> 	TomCats	myhost	4 4 4
> and alarm when the value was anything other than 4?   This syntax 
> isn't used anywhere else, is it?  The alternative is to write a 
> totally different monitor program just for this one variable.
> Any thoughts would be appreciated.

Part of the problem (with getting no answers) is that both your
messages are under another thread topic.  Some people might not
read your message because it looks like part of a discussion
they weren't interested in.  If you just send a new message to
snips-users at navya com then it shows up as a new topic for discussion.

As for your topic, it looks like the important code is in
lib/event_utils.c, and the subroutine is calc_status.

Best I can tell it would take a significant rewrite of this code
to handle a test case like you want, while still handling both
increasing and decreasing variables.

Either that or add a test for all 3 thresholds being equal and
call another subroutine, bypassing the rest of the calc_status code.
You have to make sure that you set and/or return the same variables
that calc_status does.

A workaround, and not a particularly good one, would be to have
the code in hostmon-osclients print only a good or bad status.
The problem with that is that you can't put the number of processes
for different hosts in a single location; you'd have to put it on
each host, in the program.

Another idea, is to use two variables for each host.  The
hostmon-osclient code writes both variables from the same data.
Oops, no, that's not quite going to work.  Setting all 3 thresholds
to one higher than you want works fine for too many processes,
but the only way to detect too few processes is to have different
thresholds.  Then you either have to accept that the normal status
is "warning" or hope that when you have too few it's more than just
one process too few.

Sorry that I couldn't provide a clear cut answer for you.

Anthony Vealé
National Snow and Ice Data Center
E-Mail: veale at nsidc org
Phone: (303)735-5069

Zyrion Traverse Network Monitoring & Network Management Software