Regardless of the attention given to configuring monitoring solutions, the dynamic nature of today’s modern infrastructures can impact alert functionality. Optimizing network performance in complex, hybrid infrastructures leveraging SD-WANs, real-time provisioning and other advanced features is really tough. So what should IT teams do when receiving false alerts or notifications that appear inaccurate?
Like so many tasks in network monitoring, utilizing a tried and true process is always the best place to start. Our first How To post of 2024 takes you through the same process used by our crack Customer Support team for Troubleshooting False Alerts in Netreo.
While step one is always identifying the source of the problem, doing so is easier said than done. And like pretty much every troubleshooting task, testing connectivity to the device in question is paramount.
A number of steps require a sample device, so have a sample device with a known IP address handy for reference. Knowing how the device was added to Netreo is very helpful, since similar devices are typically added the same way. You’ll also need to know whether the device is associated with a Service Engine. So before you begin …
To make sure that the device is reachable by Netreo, test connectivity to the device from your Netreo instance. To test connectivity, you use the Credential and Connectivity Test tool. Navigate to Administration -> Tools -> Credential and Connectivity Test.
By testing credentials, users are able to identify whether the credentials applied to the device in question are functioning properly and determine if access methods are blocked in some way. Troubleshooting credentials is always recommended toward the start of troubleshooting, because it’s relatively easy.
Navigate to a device’s Overview page where and click the gear symbol to go to the Admin page. Scrolling to the bottom of the Admin page, you will see the credential fields.
If a field has a lock symbol, then it does have credentials applied from a template in that field. To confirm which template was used, simply hover your cursor over the field and the template will be displayed. It’s a good idea to hover over each field to double check that all are correct.
Fields may also have credentials filled in that are not locked from a device template. Two things to keep in mind at this step:
To authenticate the device in question, go to Administration -> Tools -> Arbitrary Credentials Test. From here, try and make a connection to SNMP/SSH/Windows Powershell to an endpoint. The Arbitrary Credentials Tool gives a more specific response than the Test SNMP or Test WMI option. SNMP and WMI devices will respond with the Name of the device if it was successful. SSH devices will respond by attempting to run an echo “hello world” command.
In the tool, provide the IP address of the sample device and fill in the credentials manually or select a template to use. You can also use a specific Service Engine if there is one that is expected to have access.
If you get an authentication failure, you’ll need to test the local password.
Your next step is to confirm the device in question is using templates and what templates are assigned to the device. Netreo uses cascading templates and administration can be tricky. Be sure to check out our Knowledge Base article on Device Template Administration for additional insights if you have any questions.
For troubleshooting, go to the Device Administration Page:
Once corrected, perform a repoll of the device to make sure the change is effective.
When troubleshooting Service Engines, identifying what Remote Poller a Device is set to is important. Navigate to the Device in question and click the gear icon to go to the Device Admin page. Click “Show Advanced Settings” and then look at the “Remote Poller” dropdown. This is where Remote Poller settings are changed and where you can confirm that the Remote Poller has proper network connectivity.
Identifying Devices associated with a particular Service Engine is very important. You can identify and confirm associations from the Service Engine Administration page by going to Administration -> System -> Service Engines. Click the “Netreo Remote Poller” or “Netreo Remote Collector” icon to see which devices are being polled by that Service Engine.
For Syslog, SNMP Traps and Netflow, another type of Service Engine role is used to show the relationships. These monitoring methods are “Pushed” by the Devices sending the information, rather than “Polled” by the monitoring system. Customers define which Service Engine receives this information by adjusting the configuration of the Device to point to the IP address of related Service Engines. You can see which Devices are recognized as having a Collector by clicking the “Netreo Logging Collector” or “Netreo Traffic Collector” link next to the Service Engine in question.
Go to the Service Engine Administration page and navigate to Administration -> System -> Service Engines. If any of the Service Engines are showing a red alarm state, open and navigate to the Services tab and check if the “Service Engine Status Check” Service Check is in a Critical state. Some messages are Host alarms and will only show at the bottom of the Services tab in the Services State History section. Make note of any recent critical alarms, in case help is required from Netreo Customer Support. Messages to look for include the following:
Once you confirm configurations are applied correctly and Service Engines are working properly, confirm an Incident was created and that the notifications went out. In Netreo, for a notification to go out, an Incident must be created for the Alarms. Incidents are created only when enough soft critical alarms occur that create a hard critical alarm. This is configurable, but the default is typically an incident will get created after 3 soft alarms.
When an Alarm is triggered in Netreo and an Action Group is associated with the object that is alarming (see How To: Troubleshooting No Alerts Received for details on Action Groups), an Incident is created and the assigned Action invoked. If an Incident has been created, follow the steps below to see if notifications were sent for that Incident.
If there was no notification received:
When you see notifications being sent for the Incident, validate that the email addresses being referenced are correct. If they are not correct, it could be an issue with templating and the Action Group being assigned to the managed object.
When you are getting too many notifications and seeing a large number of notifications in the Incident, document your findings and reach out to Netreo Customer Support.
If no notifications are sent for an incident, it may be that the Action Groups being applied are incorrect or not configured correctly.
Finally, if you still haven’t identified the source of your false alert(s), reach out to Netreo Customer Support.
Netreo intelligent alert management is a cornerstone in effective incident management for many customers. By supporting anomaly detection, custom thresholds and other advanced features, Netreo ensures all alerts are truly meaningful, eliminate alert fatigue and fuel automated issue resolution.
Along with this post, our aforementioned August How To: Troubleshooting No Alerts Received are extremely useful for all Netreo customers with the DIY spirit. Leveraging helpful pointers and advanced capabilities, Netreo customers can eliminate alert fatigue and ensure every alert that comes through is meaningful. Of course, Customer Support is included with every Netreo deployment, so never hesitate to reach out for help with your specific environment.
If you’re not a current customer, see how the Netreo Platform delivers maximum value as your infrastructure management solution by Requesting a Demo Today!