[Date Prev] [Date Next] | [Thread Prev] [Thread Next] | [Date Index] [Thread Index] |
Re: Hostmon issues
|
On Tue, 20 Jul 1999, TTSG wrote: > > I have noclogd set up to pipe its messages to a paging script. > > > Hopefully being VERY careful what actually gets paged, and what > doesn't. We do alot of DNS monitoring, and its amazing what happens when > connectivity to an off-site DNS server goes and we have a couple hundred > domains go Critical. > > > > What > > happens is that noclog constantly reports that machines haven't posted any > > HostMonData. If I check netconsole, everything is hunky-dory, or at least > > as good as it gets. Hostmon is set to sleep for 15 minutes. > > Hostmon-client runs every 5 minutes. > > > Which message do you get? RPCPing or OLDData? NoData. > Are you checking the netconsole with proper log level? Yes. I'm even checking level 4. > > > > Also, I previously described problems where hostmon would either return > > one value for all the filesystems I wanted to monitor, or it would monitor > > the same group of variables across all the servers, returning a bunch of > > uninit values. Somebody previously replied and said that they devised a > > scheme whereby they look for the DFspace_%used[0-9][0-9] variables, which > > are dynamically assigned to different FS's. > > > > I modified the code so that hostmon-client would return such variables; > > thanks for the tip. Although I get less uninit variables, I still get > > some, which is a bother. Has anyone found how to further granularize the > > search for variables so that I can search for DF_space_%used03 on machine > > X, while not on machine Y? > > > I think something we do prevents this by entering EVERY machine, > EVERY disk. Granted, if 2 disks are in Warning its ugly, but that RARELY > happens. > > > > Another issue is that I keep running into problems where portmon > > erroneously reports that hosts or services are down. I've found that > > jumbling entries in the portmon-confg file helps, but doesn't entirely fix > > the problem. Why would this happen? Are there any fixes known? > > > I'd like to hear more about this. We never have a problem like that. It's very specific. I have to put the localhost SMTP port last in the portmon-confg file. Then portmon checks the localhost SMTP port first (??), and continues from the beginning of the file. If I put it anywhere else, portmon will mix up two different socket connections, think that a SMTP "500 Unrecognized..." response is coming from a web port (and vice versa) and report it as down. If I put my http proxy somewhere other in the configuration file than the first web host then it will come back saying that the rest of the web hosts couldn't open a socket because of bad file descriptors. If you want, I'll include a copy of the file (and debugging output) in future e-mail. The problem seems to be related to timing somehow--can't figure out what. > > > > Finally, nocol-4.2.2beta3 doesn't compile on Solaris 2.6. The SNMP > > utilities complain about a bunch of undefined references. I've since > > given up on that issue because beta3 isn't official yet anyway. > > > Didn't know beta3 was out... Chris Garriques reported earlier that he produced an autoconf'ed version and put it on his ftp server for testing. I don't think that Vikas has adopted it yet. - Greg Swallow -- Assistant System Administrator (whew!) -- Access Indiana "The large print giveth and the small print taketh away.." - Tom Waits http://www.ai.org -- (800) 236-5446 -- (317) 233-2010 -- gswallow@ai.org |