[Date Prev]   [Date Next] [Thread Prev]   [Thread Next] [Date Index]   [Thread Index]


     Re: portmon too slow

Michael Douglass wrote:
> Part of the problem as I see it is that portmon tends to wait TOO LONG
> for things to timeout when they are down.  Another problem is that
> everything is monitored serially; perhaps having portmon fork off a
> couple of helper children who actually do all of the work and just
> take instructions/return results from/to the parent via pipes.

I had rewritten portmon to use a 'select()' loop so that it could monitor
multiple sites in parallel. However, Frank Crawford in an earlier email
mentioned that this was also failing because the last site in the 'batch'
would not be processed for a long time.

However, I am not sure about this- the select() is on all the opened
sockets and is not serial, so unless there is a long time to process the
read data, this delay should not take place (not unless the number of
sites monitored in parallel is very large. For 10-20 sites, this should
not happen).


> Sascha Linn wrote:
> > Subject: portmon and multiple similar servers...
> >
> > hi all,
> >
> > I've got an instance of NOCOL running and I'm trying to monitor a bank
> > of webservers with it. So, in my portmon config I have the following...
> > (IPs changed)
> ...
> > I've manually checked each server and all return identical results.
> >
> > The problem is that protmon only sees the first two servers as up and
> > reports all the rest down. Has anyone seen this before? Does anyone have
> > any suggestions?
> ...

Frank Crawford had written:
> Yes, I've seen this sort of thing before.  The problem is that the portmon
> program in 4.2.2beta2 (and earlier (?)) is broken.  It attempts to run all the
> checks in parallel, but then serialises the returns, causing timeouts in the
> later entries.
> What happens in the code is something like:
> foreach entry
>         open socket
>         send data
> done
> when socket is ready
> do
>         wait for response (including some timeouts)
>         process
>         close socket
> done
> So what the web server sees is a request but the client then hangs for a period
> of time, however, the server has done it's bit, closes the socket and portmon
> ends up with no data.
> The timeouts within the code are of the order of 1-2 seconds, so the final
> entry may expect the socket to remain open for 10-15 seconds.

> On Mon, Aug 16, 1999 at 09:01:12AM -0400, Mike Gibbs said:
> >
> > Portmon seems way to slow for what it is doing.  Monitoring 150 servers
> > for 2 ports (smtp and http) it takes 1.5 hours to cycle once.  Is this
> > normal, or a misconfiguration on my part?  Also, if it is normal, what
> > functions are needed to output the same stuff to nocols databases if I
> > write my own portmon, so I can still use nocol for the gui.
> >
> >
> > Mike Gibbs