[Date Prev]   [Date Next] [Thread Prev]   [Thread Next] [Date Index]   [Thread Index]

 

     Re: Hostmon issues

On Tue, 20 Jul 1999, TTSG wrote:

> > I have noclogd set up to pipe its messages to a paging script.  
> >
> 	Hopefully being VERY careful what actually gets paged, and what
> doesn't.  We do alot of DNS monitoring, and its amazing what happens when
> connectivity to an off-site DNS server goes and we have a couple hundred
> domains go Critical.  
> >
> > What
> > happens is that noclog constantly reports that machines haven't posted any
> > HostMonData.  If I check netconsole, everything is hunky-dory, or at least
> > as good as it gets.  Hostmon is set to sleep for 15 minutes.
> > Hostmon-client runs every 5 minutes.
> >
> 	Which message do you get? RPCPing or OLDData?

NoData.

> 	Are you checking the netconsole with proper log level?

Yes.  I'm even checking level 4.

> >
> > Also, I previously described problems where hostmon would either return
> > one value for all the filesystems I wanted to monitor, or it would monitor
> > the same group of variables across all the servers, returning a bunch of
> > uninit values.  Somebody previously replied and said that they devised a
> > scheme whereby they look for the DFspace_%used[0-9][0-9] variables, which
> > are dynamically assigned to different FS's.  
> > 
> > I modified the code so that hostmon-client would return such variables;
> > thanks for the tip.  Although I get less uninit variables, I still get
> > some, which is a bother.  Has anyone found how to further granularize the
> > search for variables so that I can search for DF_space_%used03 on machine
> > X, while not on machine Y?
> >
> 	I think something we do prevents this by entering EVERY machine,
> EVERY disk.  Granted, if 2 disks are in Warning its ugly, but that RARELY
> happens.  
> >
> > Another issue is that I keep running into problems where portmon
> > erroneously reports that hosts or services are down.  I've found that
> > jumbling entries in the portmon-confg file helps, but doesn't entirely fix
> > the problem.  Why would this happen?  Are there any fixes known?
> >
> 	I'd like to hear more about this. We never have a problem like that.

It's very specific.  I have to put the localhost SMTP port last in the
portmon-confg file.  Then portmon checks the localhost SMTP port first
(??), and continues from the beginning of the file.  If I put it anywhere
else, portmon will mix up two different socket connections, think that a
SMTP "500 Unrecognized..." response is coming from a web port (and vice
versa) and report it as down.

If I put my http proxy somewhere other in the configuration file than the
first web host then it will come back saying that the rest of the web
hosts couldn't open a socket because of bad file descriptors.  If you
want, I'll include a copy of the file (and debugging output) in future
e-mail.

The problem seems to be related to timing somehow--can't figure out what.

> >
> > Finally, nocol-4.2.2beta3 doesn't compile on Solaris 2.6.  The SNMP
> > utilities complain about a bunch of undefined references.  I've since
> > given up on that issue because beta3 isn't official yet anyway.
> > 
> 	Didn't know beta3 was out...

Chris Garriques reported earlier that he produced an autoconf'ed version
and put it on his ftp server for testing.  I don't think that Vikas has
adopted it yet.

-
  Greg Swallow -- Assistant System Administrator (whew!) -- Access Indiana
   "The large print giveth and the small print taketh away.." - Tom Waits
  http://www.ai.org -- (800) 236-5446 -- (317) 233-2010 -- gswallow@ai.org