Advanced Topics

NetVigil administrators have the option of creating read-only devices for viewing by end-users. This functionality can be extremely useful when a service provider or IT department must provide shared access to a device (i.e. partitioned server, router, switch) for a number of end-users. In this case, it may be desirable to restrict end-user access to a single device.

To create a read-only device:

Search for the desired end-user for whom you want to create a read-only device and represent them. If necessary, see "To represent a user:" on page 88 for instructions.

Click on the Administer tab.

Click on the Create A Device link in the information bar.

Select the Read-Only check box.

Select or fill in all the required fields indicated, and add any optional information in the comments field.

Select the Smart Notification box if desired.

Choose the desired test types for discovery, and the SNMP version and community ID (if applicable).

Click on the Create Device button to confirm changes and begin test discovery process.

Once test discovery is complete, select the desired tests for the device and assign any action profiles (if desired by the end-user.)

Click the Provision Tests button to confirm the test creation.

8.2 Smart Notification

NetVigil's Smart Notification was designed to eliminate sending multiple notifications when a device goes down or is unavailable. Often, many configured tests on a device have action profiles assigned to them to notify various recipients when test status reaches Warning, Critical, Unknown, or all three.

Smart notification relies on the inherent dependency between the ping packet loss test results and the availability of the device. If the ping packet loss test returns 100%, then communication with the device has somehow been lost. This could result in all tests sending notifications every test cycle, especially if they are configured to notify on Unknown, which is what the test results will likely be in this situation.

To configure Smart Notification during new device configuration:

Click on the MANAGE tab. You will be taken to the Manage Devices page.

Click on the Create A Device link. For a detailed description of device creation, see Section 14.1, "Managing Devices" on page 14-177.

Fill out the required information in the Create Device page.

Check the box labeled Smart Notification.

Proceed with test discovery by selecting the Create Device button.


If you are receiving notifications when the device is down you should check the following items: Confirm that you have in fact configured Smart Notification on the device by navigating to the Manage Devices page and selecting the Update link for the device; the Smart Notification box should be checked. Confirm that you have a Packet Loss test configured on the device by navigating to the Manage Devices page and selecting the Tests link for the device; the list of test should include the Packet Loss test. If both of the above are configured properly, the notifications that you have received are likely a result of having been queued up prior the Packet Loss returning 100%. That is, if tests are scheduled in the queue ahead of the Packet Loss test, they will be executed prior to the trigger that suppresses all further notifications when Packet Loss = 100%. To avoid notifications in this case it is advisable that you change your action profiles to either 1) not notify on UNKNOWN; or 2) only notify after 2 or 3 test cycles have passed.

8.3 Device Dependency

In networked environments, switches, routers, etc. are often the physical gateways that provide access to other network devices. If critical "parent devices" are unavailable, monitoring may be impeded for devices that are accessed via the parents. To distinguish between devices that are genuinely in a CRITICAL state and those that are UNREACHABLE because of a problem with one or more parent devices, you can create device dependencies.

A device dependency is a parent-child relationship between monitored devices. A single parent can have multiple children, and a single child can have multiple parents. Device dependencies are cascading. If A is a child of B, and B is a child of C, it is only necessary to configure A as a child of B and B as a child of C. NetVigil automatically recognizes the dependency between A and C.

If a device is tested and the result is CRITICAL (for all thresholds), UNKNOWN, or FAILED, some additional processing is used to determine if the device is reachable.

A current packet loss test is examined for the device. If such a test exists and packet loss is not 100%, the device is considered reachable.

If no packet loss test exists, all immediate parent devices are examined. If the device has no parents, it is considered reachable and the result of the test is the measured value. If all parents have a current packet loss test which was measured at 100%, the device is considered unreachable.

If no packet loss test exists for the parent, or no recent test result is found for an existing packet loss test, the child device is considered reachable and the result of the test is the measured value.

Dependency Restrictions

Circular dependency is not allowed. For example, if you set up the following dependencies:

Parent and child devices must belong to the same DGE Location.

To configure device dependency:

Create the parent device as described in Section 14.1, "Managing Devices" on page 14-177.

Create the child device as described in Section 14.1, "Managing Devices" on page 14-177.

Click MANAGE | devices.

On the Manage Devices page, find the device that will be the child device in the dependency and click Update.

On the Update Device page, click Update Device Dependency.

On the Update Device Dependency page, select the device or devices on which this child depends from the Does Not Depend On list, and then click Done. (If you return to the Device Dependency page you will see that the parent device(s) appear in the Depends On list).

Note: Device dependencies are cascading. If A is a child of B, and B is a child of C, it is only necessary to configure A as a child of B and B as a child of C. NetVigil automatically recognizes the dependency between A and C.

The next time the parent device has a CRITICAL ping/pl test result, the child device will have UNREACHABLE status.

8.4 External Help

External help provides NetVigil operators the ability to write support documentation specific to a Department, device or test and tie it directly to that same object via a HELP link in the Web UI. This way, less experienced system administrators can be provided with a first line of troubleshooting in the absence of live support. You can also enable actions (e.g., server restart) via the HELP links. This is a powerful option, as any number of files can be configured to work in this fashion, enabling a large number of background processes via the web app.

The perl script displayTestHelp.pl in the utils directory scans through NETVIGIL_HOME/plugin/help for help text specific to a Department, device or test. This script expects one argument in the form:

where device_addr can be fqdn or ip address. This has to match what was used for device creation. The field test_name should match the descriptive name that was displayed during test creation (or in test details page).

The script searches NETVIGIL_HOME/plugin/help according to following algorithm:

Search for directory - acct_name ELSE _default_user, if found, cd into it.

Search for subdirectory - device_name ELSE device_addr ELSE _default_device, if found, cd into it.

Search for the files in the current directory in the following order: <test_type>_<test_subtype>_<test_name>.{html,txt} ELSE
<test_type>_<test_subtype>.{html,txt} ELSE
<test_type>.{html,txt} ELSE
default.{html,txt}

Display the entire file on stdout (if text, then put HTML tags around the text).

If not found, display NO FILE FOUND on stdout in HTML format. The script prints out errors on stdout. The location of the script is specified in web.xml and it can basically be any script (or program). It is up to the target script to take the arguments and send back help text in the required format.

For example, to create a help file for device 'mail_server' and a more specific one for the 'disk_space', in Department 'local_department':

cd NETVIGIL_HOME/plugin/help
mkdir -p local_department/mail_server
mkdir -p local_department/_default_device
cd local_department/
vi _default_device/default_html
vi mail_server/snmp_disk.txt
vi mail_server/default.html

It is possible to use your own script, that (for example) connects to a database and retrieves escalation information based on specified criteria.

8.5 High Availability Configurations

The NetVigil distributed database and processing architecture allows very high levels of fault tolerance and scalability during deployment. All of the components in the various tiers are horizontally scalable which is essential for expansion and real-time performance reports.

All of the configuration information is stored in the BVE configuration database. On startup, the DGEs connect to the BVE configuration database and download a local copy of their configuration. Any updates made to the BVE configuration database are pushed out in real-time to the corresponding DGE.

To handle the case of a DGE physical server going down, you can setup a spare 'hot standby' server in any central location (N+1 redundancy) which has the software installed and configured. In the case of a production DGE going down for an extended period of time due to hardware failure, you can set the name of the DGE in the dge.xml config file (see "DGE Identity" on page 30) and start NetVigil on the backup server. This backup DGE will automatically connect to the BVE configuration database and download the configuration of the failed DGE. When the production DGE comes back up, it can be even run in parallel before shutting down the backup DGE. The only caveat is that the performance data collected during this interval will be missing on the production DGE (this requirement of synchronizing the performance data will be addressed in a future release).

If desired, you can have a backup DGE for each of the production DGEs (N+N redundancy) but this is not really needed if the centralized DGE can poll all the data remotely.

If connectivity between the DGE and the BVE database is lost, the DGE continues to poll, aggregate and even generate alarms completely independently. When connectivity to the BVE database is restored, the DGE restarts and downloads a fresh copy of its configuration database.

The BVE database can be replicated on multiple servers for fault tolerance. In the future, the DGEs will be able to automatically fail-over to an alternate BVE database if the primary database is not reachable.

The performance database which is local to each DGE can be located on a remote database cluster if needed for fault tolerance also. The JDBC communication between the DGE and the performance database allows such a setup seamlessly just by a few configuration file changes.

Lastly, the Web application and reporting engine also gets all the configuration information from the BVE database server on startup and hence you can have any number of web application servers behind a load balancer for fault tolerance as well as distributed report processing.