Too many alerts are frustrating and have an even worse trickle down effect on IT teams. When alert deluge turns to alert fatigue, critical issues may be ignored. But seasoned IT pros will tell you that receiving no alerts causes even greater distress. When monitoring tools go silent, user complaints are sure to follow.
Even with Netreo’s high-value, intelligent alert management capabilities, issues can go undetected from time to time. For example, a Cisco Switch may power off and cause a small outage in your environment. Yet, you received no alert in your inbox or text from Netreo.
While reaching out to Support Services will always solve your issues, you also have options. Keeping in mind that device misconfigurations are the most common reason that managed devices fail to send alerts, let’s cover How To Troubleshoot No Alerts Received in Netreo.
Your first step should always be checking the device to see if there was a Maintenance Window in effect during the alarm. Maintenance Windows are a useful Netreo feature that prevent incidents from being created and alerts being sent during scheduled maintenance.
You can view a device’s maintenance from the Device’s Dashboard -> Reports -> Maintenance Window History. It’s good to note that overlapping Maintenance Windows can cause unexpected behavior, so best to check for Maintenance Windows on all devices in the group or vicinity.
If there was a Maintenance Window in effect during the time of the alarm, an incident will not get created. No incident means no alerts forwarded to you or your team via email, text, or any other configurable Action method
If you see a Maintenance Window was in effect during the time the alarm was created, note the maintenance period so you can check on the impact on other Alarms, Incidents and Actions. Our Knowledge Base article on Maintenance Windows provides additional, useful details.
If there was no Maintenance Window at the time of the alarm, the issue lies elsewhere. Move onto Action Groups and Alert Times.
Action Groups are containers for methods of communication or actions required. When troubleshooting the absence of an alert for a known incident, find out which Action Groups are associated with the incident. Action Groups are built in Administration, so go to Administration -> Alerts -> Actions. Here you will uncover what Action Groups are associated with the alarming object, what alert times are applied to the Action Group and what Action Methods or being invoked for a given alarm in state.
First thing you will want to do is see which Action Groups are associated with an incident in Netreo. This will also show you the number of times those Action Groups have been invoked. When you’re looking for alerts not received, this section will show you if any alert actions are associated with the incident, if so, did those alert actions get triggered, or if the wrong group was invoked.
Action groups are associated with Service Checks and Thresholds (as well as Host Down incidents) and they are expected to be invoked when either goes critical. When you assign an Action Group to an object in Netreo, you’ll see the Action Group and associated Action Methods that will get invoked if that object goes critical.
To determine if the correct Action Group is associated:
Note: Setting the action group to black hole is the equivalent of setting it to None. Check out Knowledge Base for further details on the black hole Action Group.
Incident Management Rules can be viewed under Administration -> Alerts -> Incident Management. These can add to or change the Action Groups that are associated with an Incident when that Incident is opened. By default, Netreo has rules called Configuration Change Alerts, Suppress Alerts if Auth not working and Site Latency.
If you haven’t discovered the source of the issue and a solution, your next step is to confirm the template applied is the correct template for the device. Templates hold the credentials, service checks, thresholds and host contacts for devices. If a device has the wrong template applied, the alert may not go to the correct contact. You also want to make sure templates are not disabled for the device.
Your next step is to confirm the device in question is using templates and what templates are assigned to the device. Netreo uses cascading templates and administration can be tricky. Be sure to check out our Knowledge Base article on Device Template Administration for additional insights if you have any questions.
For troubleshooting, go to the Device Administration Page:
Once corrected, perform a repoll of the device to make the change is effective.
After you confirm that the configurations in the template are correct, the next step is to confirm the configurations are getting applied to the device. Configurations from templates are applied to a device during a rediscovery. If configurations from templates are not applied to the device, check out the How to Troubleshoot Discovery Polling Failures section in our recent How To, Troubleshooting Alarms Not Recovering.
Once we confirm configurations are applied and the correct Action Group is set, we need to confirm an Incident was created and that the notifications went out. In Netreo, for a notification to go out, an Incident must be created for the Alarms. Incidents are created only when enough soft critical alarms occur that create a hard critical alarm. This is configurable, but the default is typically an incident will get created after 3 soft alarms.
When an Alarm is triggered in Netreo and an Action Group is associated with the object that is alarming, an Incident is created and the assigned Action invoked. If an Incident has been created, follow the steps below to see if notifications were sent for that Incident.
When there was no notification received:
If you see notifications being sent for the Incident, validate that the email addresses being referenced are the correct addresses. If they are not correct, it could be an issue with templating.
Next, if you are getting too many notifications and you are seeing a large number of notification in the Incident, document your findings and reach out to Netreo
If no notifications are sent for an incident, it may be that the Action Groups being applied are incorrect or not configured correctly.
The overall health of the Netreo server can play a part in customers not receiving alerts. Navigate to the System Diagnostics page and escalate any issues you see there. Issues with critical system processes not running, such as Email, can be diagnosed via System Diagnostics. For further troubleshooting details, check out the Systems Diagnostics section in our recent How To, Troubleshooting Alarms Not Recovering.
Another step to take is to review your logs. Netreo offers logs that provide Incident creation and Notification details. Reviewing the Audit Log, you can make sure no changes were made to the Device that could have prevented an Alert. Reviewing the Debug Log, you can confirm an Incident was created and Notification went out in the backend. For further troubleshooting advice, check out the Audit and Debug Logs section in our recent How To, Troubleshooting Alarms Not Recovering.
Netreo sends emails when an issue arises for a managed device and can be configured to send scheduled emails for events or various reports. The System Mail Logs capture all actions related to sending emails. So, if you still haven’t discovered a cause for no alerts received, you’ll want to check if the email made it to its destination.
Using our System Mail logs, you can confirm if there is anything blocking or rejecting the actual email as it gets to each SMTP server or relay. Visibility into these logs varies depending on the Netreo platform. To access Netreo Mail Logs on a SaaS instance, go to Administration ->Alerts ->Mail. This will take you to a screen with the most recent mail logs:
To see additional logs or filter the results, hit on “More…” at the bottom left of the page:
To access Netreo Mail Logs on an On-Prem Netreo instance, go to Administration -> System -> System Logs:
Click on Get Logs to see that last 25 Mail Log entries:
The Mail Log contains entries for each email notification sent by the system and provides Filters to expedite troubleshooting. Use the email address in question, or subject lines, such as a scheduled report or Threshold – as the example below demonstrates:
This output shows the email messages sent that contained the word Threshold. When errors occur in the Mail Log, the system includes a unique identifier in brackets – 1441074 in the example above.
Changing the filter, to filter on that identifier, returns the entire string of that email message. You can use this output to determine if the email was sent, if it failed and in some cases, why the email failed to send.
The very last line of the output shows that the email was sent from our system successfully: status=sent. Unfortunately, this means your problem lies elsewhere, so you’re better off contacting Netreo Support Services.
If you see a failure message like the one below, the problem is likely with your email system or an incorrect email address. Double check that you have the correct email address, and if not, capture the error message and correct email address and contact Netreo Support Services.
The issue of no alerts received can have many causes. But you do have shortcuts. When I see that an Action Group is associated with the incident, I go straight to Mail Logs and check the SMTP status. When email is your primary tool for alert notifications, this can save you a lot of time.
Otherwise, following the steps above should get alert notifications back on track. Of course, never hesitate to reach out to Netreo Support Services whenever you’re troubleshooting alerts on your own, or any other issue you have. Your success is our success, and we’re always glad to help.
To see how the Netreo Platform delivers maximum value as your infrastructure management solution, Request a Demo Today!