[Date Prev]   [Date Next] [Thread Prev]   [Thread Next] [Date Index]   [Thread Index]


     Re: Hostmon issues

> I have noclogd set up to pipe its messages to a paging script.  
	Hopefully being VERY careful what actually gets paged, and what
doesn't.  We do alot of DNS monitoring, and its amazing what happens when
connectivity to an off-site DNS server goes and we have a couple hundred
domains go Critical.  
> What
> happens is that noclog constantly reports that machines haven't posted any
> HostMonData.  If I check netconsole, everything is hunky-dory, or at least
> as good as it gets.  Hostmon is set to sleep for 15 minutes.
> Hostmon-client runs every 5 minutes.
	Which message do you get? RPCPing or OLDData?

	Are you checking the netconsole with proper log level?
> Also, I previously described problems where hostmon would either return
> one value for all the filesystems I wanted to monitor, or it would monitor
> the same group of variables across all the servers, returning a bunch of
> uninit values.  Somebody previously replied and said that they devised a
> scheme whereby they look for the DFspace_%used[0-9][0-9] variables, which
> are dynamically assigned to different FS's.  
> I modified the code so that hostmon-client would return such variables;
> thanks for the tip.  Although I get less uninit variables, I still get
> some, which is a bother.  Has anyone found how to further granularize the
> search for variables so that I can search for DF_space_%used03 on machine
> X, while not on machine Y?
	I think something we do prevents this by entering EVERY machine,
EVERY disk.  Granted, if 2 disks are in Warning its ugly, but that RARELY
> Another issue is that I keep running into problems where portmon
> erroneously reports that hosts or services are down.  I've found that
> jumbling entries in the portmon-confg file helps, but doesn't entirely fix
> the problem.  Why would this happen?  Are there any fixes known?
	I'd like to hear more about this. We never have a problem like that.
> Finally, nocol-4.2.2beta3 doesn't compile on Solaris 2.6.  The SNMP
> utilities complain about a bunch of undefined references.  I've since
> given up on that issue because beta3 isn't official yet anyway.
	Didn't know beta3 was out...