[Date Prev]   [Date Next] [Thread Prev]   [Thread Next] [Date Index]   [Thread Index]


     Re: [snips-users] hostmon falling over / snipstv/eventselect segmentation fault:

On Mon, Jul 24, 2006 at 01:32:35PM -0700, Russell Van Tassell wrote:

> I've not tried going 64-bit on NOCOL, however, though I've been building
> a new 64-bit Solaris machine... I might be inclined to check it out,
> though.
> (Apologies for the delayed reply here... life has kind of "taken over"
> recently)

Well, the problem seems to be (amongst others) anything that tries to use 
the SNIPS perl module pack_event() will fall over with a segmentation 
fault... but not always, so I think it may be something to do with monitors 
that use subdevices.

Despite a colleague hacking the code to compile and work on 64bit linux, 
We've been unable to fix this problem so ended up migrating everything from 
hostmon to snmpgeneric which does not use pack_event().

As our monitoring is made up of several machines there are other problems.

The SNIPS database files, and RRD files generated on 32bit machines are not 
readable by the 64bit machine. I haven't yet got round to fixing this, but 
it would involve exporting the database files as text on the 32bit boxes as 
text with display_snips_datafile / rrdtool dump and then re-import on the 
64bit box.

Unfortunately I can't get my re-import script for SNIPS data to work on 
either the 32bit machines or the 64bit machines. Maybe something I'm doing 
wrong. The SNIPS Perl API seems a bit vaguely documented in places, and 
doesn't give any good examples of how to use pack_event correctly. (i.e., 
what you need to present it with and how, in which order.)

If I can't get this to work then I will have to rewrite some stuff (in perl) 
that uses the SNIPS data files to use the text files instead (snipstv and 
snipsweb). This is easy enough to do I suppose, just takes a bit of time.

On the day of the migration I discovered that snipslogd on the central host 
did not seem to work remotely. It works fine receiving events from monitors 
locally, but does not receive events from the remote boxes.

I can see the UDP packets arrive, but snipslogd doesn't do anything with 
them, but logs errors about: "readevent: socket read failed (incomplete)--"

The workaround to this was to configure the remote boxes to use syslog as a 
transport, by running snipslogd on each remote box, and configuring 
everything to log to that local snipslogd, and then configure snipslogd to 
pipe everything to logger:

*               info    |/usr/bin/logger^-t^snipslogd^-p^local3.info

I then configured syslog-ng to ship those events back to the central host, 
and tweak my event processing script to strip off the syslog date and time 
bits before trying to parse as a SNIPS event. It turns out that this is also
slightly more robust since I set it up to use TCP and not UDP, and it allows 
me to keep a local log on each server as well as log to multiple 
destinations. (Something that you couldn't really do with snipslogd)

Longer term I will probably need to replace SNIPS with something else (as we 
will shortly have more requirements like IPv6 Monitoring and more precise 
latency/jitter/packet loss monitoring. 

It may be that I can come up with a replacement "multiping" wrapper script 
that makes this work. Unfortunately every alternative out there seems to 
involve a huge bloaty system that back-ends on mysql or some other database, 
uses php/java or is yet another open monitoring framework which may or may 
not be developed to a usable point, or just be too complicated for me to 
understand. Network engineers are not usually also software developers, so I 
can manage to hack together something in perl that works, but some of these 
new montioring frameworks require advanced knowledge of java, python, php or 
some other wacky script language.

The nice thing about snips was that it's lightweight and it just worked 
(except for a few foibles, most of which have been corrected.)

Maybe I'm getting too old for this lark, I should let the bright new kids 
have a go, say I'm old-fashioned with my rickety perl scripts, and not 
listen to me, and then repeat all the same mistakes I made in the past. :)



Robert Lister   -   London Internet Exchange    -  http://www.linx.net/
robl at linx net   -   tel: +44 (0)20 7645 3510    -  RL786-RIPE

Zyrion Traverse Network Monitoring & Network Management Software