[Date Prev]   [Date Next] [Thread Prev]   [Thread Next] [Date Index]   [Thread Index]


     Re: portmon and multiple similar servers...

On Jul 22,  2:00pm, Sascha Linn wrote:
> Subject: portmon and multiple similar servers...
> hi all,
> I've got an instance of NOCOL running and I'm trying to monitor a bank
> of webservers with it. So, in my portmon config I have the following...
> (IPs changed)
> I've manually checked each server and all return identical results.
> The problem is that protmon only sees the first two servers as up and
> reports all the rest down. Has anyone seen this before? Does anyone have
> any suggestions?

Yes, I've seen this sort of thing before.  The problem is that the portmon
program in 4.2.2beta2 (and earlier (?)) is broken.  It attempts to run all the
checks in parallel, but then serialises the returns, causing timeouts in the
later entries.

What happens in the code is something like:

foreach entry
	open socket
	send data

when socket is ready
	wait for response (including some timeouts)
	close socket

So what the web server sees is a request but the client then hangs for a period
of time, however, the server has done it's bit, closes the socket and portmon
ends up with no data.

The timeouts within the code are of the order of 1-2 seconds, so the final
entry may expect the socket to remain open for 10-15 seconds.

Unfortunately, there isn't any easy way to fix the problem without a rewrite of
the portmon code, however, an easy workaround I found was to go back to the
portmon in an earlier release (4.2.1 (?)) which didn't try to process the data
in parallel.


Frank Crawford		Email:	frank@ansto.gov.au	Postal:	PMB 1
Site Systems Manager	Phone:	+61 2 9717 3015			Menai NSW 2234
ANSTO			Fax:	+61 2 9717 9273			Australia

PGP Fingerprint: (8BB1C821) 06 4F 35 82 1D D6 0E 56  9F AB B8 F7 67 AF 1A 9D