     [snips-users] hostmon-collector on Solaris 2.5.1 failing (hostmon DataAge reaches OldCrit)

For some weird reason, would seem as though the hostmon-collector is
failing on Solaris 2.5.1 with no real errors... the "DataAge" from
hostmon comes through, so I'm assuming that's working fine (ie. the
problem isn't in "hostmon" itself; it's not getting any data to process
from the collector).

On a closer look, hostmon-collector /never/ makes an attempt to connect
out to other machines on the hostmon port (5355); this information
garnished through a packet sniffer.  Simple telnets to the other machines
from the snips and/or hostmon box yield the desired stream just fine (and
DO show up in the packet sniff); so I know the problem is somewhere on
the collector box.

So, delving in to hostmon-collector, it would appear as though the child
process isn't doing anything useful... I try running hostmon-collector
in "debug" mode (using -d) and don't manage to get much more useful
information (though I note that I have to specify a hostname on the
command line if using the debug switch, else it simply tries "localhost"
and "nfs1.jvnc.net." Still, the packet sniffer catches nothing going
across 5355.

  (debug) hostmon-collector (child 1 of 9410): try_telnet() to fubar
  (debug)) hostmon-collector (child 1 of 9410): try_scp() to fubar
  (debug) hostmon-collector (child 1 of 9410): try_scp() to fubar failed
  hostmon-collector (child 1 of 9410) could not get /tmp/hostmon_data/fubar.hostmon

Trying to debug the newSocket code, it would seem as though it never
returns (read: putting in print statements for debugging after nearly
every line).  Guessing that's something to do with the forking off of
child processes, though I'm not sure why the print statements suddenly
go away (ie. don't see the fork in a line-by-line analysis of



Russell M. Van Tassell
russell at loosenut com

