Reports

NetVigil has extensive and flexible reporting at various levels (container, device, tests) as well as of different types (fault, performance, SLA). Most reports are generated in real-time by collecting data from the DGEs and then creating the graphs and statistics from the raw data by the BVE reporting engine.

There are three levels of reports with increasing levels of flexiblity: Summary, Advanced and Custom. The reporting framework is very flexible and allows completely arbitrary custom reports and statistics generated on the fly.

Fault Management Reports: Includes Event History, Service Instability report, Device Instability, Threshold Violations

Performance and Capacity Planning: Top N usage and trend report, 30 Day upcoming trends

SLA Reports: Unavailability, Downtime

Alarm Reports: derived from from traps or syslog messages

Stored/Scheduled Reports: Modify, Execute or delete queries

The graphs automatically scale between linear and logarithmic scale if the range of the graph is too high (e.g. for a bandwidth spike, which would render a linear graph useless).

15.1 Summary Reports

These reports give a quick snapshot for the past week. There are reports for a technical manager, and executive reports for a business manager showing the impact on business service containers.

15.1.1 Technical Summary Report

This report gives the top N problems, distribution of problems by day of week, correlation graphs for all devices that had problems over the past week, trend graphs for various elements, network, CPU and disk issues over all devices.

15.1.2 Business Impact Summary

This report shows which devices or elements caused various service containers to go down, correlation graph for the top 10 service containers (excluding OK elements), top N service containers sorted by downtime. All reports are interactive, so you can drill down further into any report for much more detailed analysis in real-time.

15.2 Advanced Reports

These reports give operational and engineering analysis of your IT infrastructure and answer some commonly asked questions.

15.2.1 Fault Management Reports

The Fault Management Reports provide an in-depth and rigorous analysis of the events where tests/devices and services crossed the thresholds. They provide Device and Service reports on the most fault prone services and the number of events that occurred.

These reports use events (or threshold violations) to calculate the number of times or the total time spent in warning or critical conditions. These reports answer questions such as:

Availability: what was the total time for which monitored elements were unavailable?

How many outages occurred?

What was the average time of an outage (MTTR)?

is designed to provide a consolidated view of events for either the last 24 hours or for a specific historical month. Each report entry is a unique combination of device name, test name and severity, detailing both the total duration in the specified severity (i.e. CRITICAL, WARNING, etc.) and the number of times that the test entered that severity. Below the text listing is a graphical display of the top 10 'worst' results in a horizontal bar style. Clicking on any of the column headings for the text list will automatically update this graph.

provides reports on top 10, 25 or 50 services affected by number of events. The report consists of the Frequency distribution of the events during each hour of the day, each day of the week/month and duration of events.

provide you with data on threshold violations for Bandwidth, CPU, Memory and Disk Utilization.

15.2.2 Performance and Capacity Planning Reports

These reports help you plan managing your IT infrastructure investments and targeting them in right direction. These reports help to know where exactly the performance is the bottleneck due to capacity constraints.

gives useful data on the capacity planning for creating redundant capacity where required and removal of excess capacity where it is not required by reporting on TOP N devices or Tests by highest or lowest usage values. This report can be based on the status of one or more test types.

for Bandwidth, CPU and Disk Space utilization gives a trend analysis for next one month and allows you to plan accordingly.

15.2.3 SLA Reports

The Unavailability/Downtime SLA Report is based on device availability as measured by the ICMP packet loss test. The report shows how many times and for how long Packet Loss tests were in the Critical or Unreachable states. The SLA threshold for the Packet Loss test is used to determine when the test was in Critical state. This report shows the Top 10 devices by amount of "unavailability", displaying total time unavailable and % unavailable, with graphics showing either view.

Users may link to an availability distribution report/graph as well. This histogram is a distribution of the numbers of devices falling into blocks of 10% availability. That is, it displays the number of devices falling between 0-10% availability, 10-20% availability, and so on.

The Threshold Violation report allows you to run reports on system resources (CPU, disk space, bandwidth, etc.), comparing test results with SLA thresholds.

To create custom reports that use SLA thresholds:

Click REPORTS | Custom.

On the Create Test Level Report page, set the Severity field to SLA.

Select other report parameters, and then click GO.

15.2.4 Alarm Reports

For alarms generated from text messages such as SNMP traps, syslog messages or other logs inserted via the ISM API, the following reports are available:

Top N alarms

Top N alarms by frequency

Alarm count by time of day

15.2.5 Stored and Scheduled Reports

You can save any custom or advanced report and then schedule the report to be run automatically and email the results if desired. Whenever an advanced or custom report is generated, a `Save' option is displayed on the report to save it under a custom name. These saved reports are all listed under this menu item.

15.3 Custom Reports

In addition to the large number of preset reports listed above, NetVigil offers complete flexibility in creating ad-hoc reports over any time period. You can select the data over which to generate the report by specifying the device or test names, the time period and other such parameters. You can decide on the type of report to be generated such as a top-N table, or a trend report, a correlation graph, etc.

Generates one or more of the Top Ten, Number of Events Distribution, Event Duration Distribution, Number of Events, Performance, Statistics, Trend Analysis reports for the particular tests of chosen test types for a device.

Generates reports for one or more of Top Ten, Number of Events Distribution, Event Duration Distribution, Number of Events for devices of a particular vendor and Device type.

Features the event distribution against time reports for chosen types of Tests or a particular device.

Short Graphical Reports for last 24 hours 5-minutes interval for a particular test/device or types of tests chosen

Plots reports for the similar tests on a single graph allowing comparison of performance.

15.4 Sample Reports

15.4.1 Instability Report

This report pinpoints the main problems across the entire IT infrastructure by calculating the total time they were in Critical or Warning stage. It also shows the distribution of alarms on a daily basis as well as the distribution over the day of the week and hour of day. You can drill down into a device and see all the individual problems on that device and also see trend reports from the same screen.

15.4.2 Test Performance Detail

15.4.3 Trend Analysis Reports

15.4.4 Event Correlation Report

This report shows a 24 hour snapshot of all the individual elements of a service container or a device. If a particular test has gone into any non-OK state within an hour, that hour is colored to reflect the non-OK state. This report allows you to correlate various problems in your service container or device and see what events happened during the same hour during the day..