[Date Prev] [Date Next] | [Thread Prev] [Thread Next] | [Date Index] [Thread Index] |
Re: portmon and multiple similar servers...
|
On Jul 22, 2:00pm, Sascha Linn wrote: > Subject: portmon and multiple similar servers... > > hi all, > > I've got an instance of NOCOL running and I'm trying to monitor a bank > of webservers with it. So, in my portmon config I have the following... > (IPs changed) ... > I've manually checked each server and all return identical results. > > The problem is that protmon only sees the first two servers as up and > reports all the rest down. Has anyone seen this before? Does anyone have > any suggestions? ... Yes, I've seen this sort of thing before. The problem is that the portmon program in 4.2.2beta2 (and earlier (?)) is broken. It attempts to run all the checks in parallel, but then serialises the returns, causing timeouts in the later entries. What happens in the code is something like: foreach entry open socket send data done when socket is ready do wait for response (including some timeouts) process close socket done So what the web server sees is a request but the client then hangs for a period of time, however, the server has done it's bit, closes the socket and portmon ends up with no data. The timeouts within the code are of the order of 1-2 seconds, so the final entry may expect the socket to remain open for 10-15 seconds. Unfortunately, there isn't any easy way to fix the problem without a rewrite of the portmon code, however, an easy workaround I found was to go back to the portmon in an earlier release (4.2.1 (?)) which didn't try to process the data in parallel. Frank -- Frank Crawford Email: frank@ansto.gov.au Postal: PMB 1 Site Systems Manager Phone: +61 2 9717 3015 Menai NSW 2234 ANSTO Fax: +61 2 9717 9273 Australia PGP Fingerprint: (8BB1C821) 06 4F 35 82 1D D6 0E 56 9F AB B8 F7 67 AF 1A 9D |