[Date Prev] [Date Next] | [Thread Prev] [Thread Next] | [Date Index] [Thread Index] |
Re: Monitoring the monitors
|
> > On Sat, Jul 10, 1999 at 06:11:41PM -0400, TTSG wrote: > > Hi, > > > > We just ran into a problem where a machine failed, and it caused > > hostmon to "lock". The last thing it appears to be doing was an rcp of > > the file from the machine that failed. We fixed the machine last Friday > > and didn't check NOCOL. Since then, 2 other machines had the hostmon > > monitor die, which we didn't know. Even worse, though, was that another > > machine went down and we didn't find out until we saw other indications. > > > > I hate to do this, but is there something we can do to monitor the > > monitors? (We'd of course have a monitor monitor). > > *ack* > > Perhaps, once every ten minutes or so: > > ls -lt ~nocol/logs | head -1 > > ...if a file hasn't changed "recently," either you have a fairly docile > network or something's buggy with the monitoring (specifically, hostmon > tends to log the idle time of the CPU and context switches once every few > minutes as it's ratehr rare the the relative load on my server(s) don't > change ever-so-slightly) > But you couldn't tell if it was hostmon logging an info, or something else....No? > > So now we need a monitor-monitor monitor, right? *chuckle* > Actually considering a 2nd machine to monitor the first. > > > > In an unrelated story....... Is there a way to keep "X" previous copies > > of hostmon output? We sometimes don't catch a situation that would have > > really been great to see the hostmon information until after its refreshed > > it. Only keep the last "X" rolling copies of the config per machine. > > Well, a "painful" way to rebuild the data may be to: > > cat ~nocol/logs/* | grep '\[hostmon\]' | sort > Doesn't show the data I'm looking for. Only some. > > > I know... perhaps not the most elegant, but... > Any hints/tips/tricks are appreciated! Tuc/TTSG |