fbpx
Netreo is now BMC. Read theBlog

Netreo How To: Troubleshooting No Alerts Received

By: Rene Sanchez
August 14, 2023

Too many alerts are frustrating and have an even worse trickle down effect on IT teams. When alert deluge turns to alert fatigue, critical issues may be ignored. But seasoned IT pros will tell you that receiving no alerts causes even greater distress. When monitoring tools go silent, user complaints are sure to follow.

Even with Netreo’s high-value, intelligent alert management capabilities, issues can go undetected from time to time. For example, a Cisco Switch may power off and cause a small outage in your environment. Yet, you received no alert in your inbox or text from Netreo.

While reaching out to Support Services will always solve your issues, you also have options. Keeping in mind that device misconfigurations are the most common reason that managed devices fail to send alerts, let’s cover How To Troubleshoot No Alerts Received in Netreo.

Maintenance Windows

Your first step should always be checking the device to see if there was a Maintenance Window in effect during the alarm. Maintenance Windows are a useful Netreo feature that prevent incidents from being created and alerts being sent during scheduled maintenance.

You can view a device’s maintenance from the Device’s Dashboard -> Reports -> Maintenance Window History. It’s good to note that overlapping Maintenance Windows can cause unexpected behavior, so best to check for Maintenance Windows on all devices in the group or vicinity.

If there was a Maintenance Window in effect during the time of the alarm, an incident will not get created. No incident means no alerts forwarded to you or your team via email, text, or any other configurable Action method

If you see a Maintenance Window was in effect during the time the alarm was created, note the maintenance period so you can check on the impact on other Alarms, Incidents and Actions. Our Knowledge Base article on Maintenance Windows provides additional, useful details.

If there was no Maintenance Window at the time of the alarm, the issue lies elsewhere. Move onto Action Groups and Alert Times.

Action Groups / Alert Times

Action Groups are containers for methods of communication or actions required. When troubleshooting the absence of an alert for a known incident, find out which Action Groups are associated with the incident. Action Groups are built in Administration, so go to Administration -> Alerts -> Actions. Here you will uncover what Action Groups are associated with the alarming object, what alert times are applied to the Action Group and what Action Methods or being invoked for a given alarm in state.

What Action Groups & Alerts are Associated with an Incident?

First thing you will want to do is see which Action Groups are associated with an incident in Netreo. This will also show you the number of times those Action Groups have been invoked. When you’re looking for alerts not received, this section will show you if any alert actions are associated with the incident, if so, did those alert actions get triggered, or if the wrong group was invoked.

  1. Navigate to the example incident
  2. Look at the Contact Groups pane in the lower left corner
    1. Each of these are added by the Alarm that created the Incident, or an Incident Management Rule
  3. Click the Grey Circle above this pane to review all Alerts sent from this Incident
    1. Grey Circles will have a number indicating how many Alerts have been sent
    2. Each Alert will indicate the Action Group that caused the Alert to be sent and if it was successfully sent
    3. WebHook Alerts will have JSON responses available via the Edit button on the far right of the pop-up

What Action Groups are Associated with the Service Check or Thresholds? 

Action groups are associated with Service Checks and Thresholds (as well as Host Down incidents) and they are expected to be invoked when either goes critical. When you assign an Action Group to an object in Netreo, you’ll see the Action Group and associated Action Methods that will get invoked if that object goes critical.

To determine if the correct Action Group is associated:

  1. Navigate to the example incident
  2. Click on the Overview button (Gauge Button) for the device
  3. Click on the Gear in the upper right to go to the Device Admin Page
  4. Navigate to the Service or Instances Tab depending on what Alarm created the incident
  5. Find the entry that correlated to the incident created
    1. If there is a Lock Icon for this entry, hover over it to see which template applied the settings for the particular service, threshold etc.
      1. Click the Lock Icon and locate the related check
      2. Click the Edit Icon to view which Action Groups are associated with the Check
    2. If there is no Lock Icon, click the Edit button to see which Action Groups are associated with the Alarm
  6. Check if the Action Groups associated match what are present on the Incident
    1. If there are Action Groups that do not match, note these on the case and review the Incident Management Rule section for further details
    2. If the problem Action Group does match the Check or Alarm, note the location of the configured action groups and make all needed adjustments to remove invalid alerting

Note: Setting the action group to black hole is the equivalent of setting it to None. Check out Knowledge Base for further details on the black hole Action Group.

What Incident Management Rules have Added Action Groups to an Incident

Incident Management Rules can be viewed under Administration -> Alerts -> Incident Management. These can add to or change the Action Groups that are associated with an Incident when that Incident is opened. By default, Netreo has rules called Configuration Change Alerts, Suppress Alerts if Auth not working and Site Latency.

Steps for Troubleshooting

  1. Review all Incident Management Rules looking for Actions that Add an Extra Alert or Send a Custom Alert
    1. If any entries are adding the Invalid Action Group, check if the logic for that Rule matches the conditions of the Incident. Make note of the results for reference
      1. If the conditions of the Incident do match the Rule, then you should modify the Incident Management Rule to remove the Invalid Alerting
      2. If the conditions do not match, reach out to Netreo Support Services
  2. Confirm the correct email address or other method is configured on the Action Group
    1. Administration -> Alerts -> Actions
  3. From Action Groups, you have the ability to run the actions manually from the Incident Details page. If the Action Group being investigated was configured with this feature, you can manually execute the Action Group to confirm if Alerts are getting properly logged to the Notifications on the Incident Management Page
  4. If a proper Action Group was in place and in the Incident Management Page, you should see notifications going to the contacts you configured. You can get further insights for investigation by searching for the contact in your mail logs
  5. Action Group Alerting Time Frame Parameters. Action Methods within an Action Group, can be configured to only perform during certain days/hours. To create or adjust these timeframes, go to Administration -> Alerts -> Time Frames

If you haven’t discovered the source of the issue and a solution, your next step is to confirm the template applied is the correct template for the device. Templates hold the credentials, service checks, thresholds and host contacts for devices. If a device has the wrong template applied, the alert may not go to the correct contact. You also want to make sure templates are not disabled for the device.

Confirm Device Template Assignments

Your next step is to confirm the device in question is using templates and what templates are assigned to the device. Netreo uses cascading templates and administration can be tricky. Be sure to check out our Knowledge Base article on Device Template Administration for additional insights if you have any questions.

For troubleshooting, go to the Device Administration Page:

  1. Click the Advanced Options to see the Template section
  1. Make sure the template usage is enabled
  2. Verify that the correct Template is assigned
    1. If not, issues can be corrected by the following corrective actions
      1. Add the device to the correct Group Object (Device type, Sub type, Site, Category, Functional group)
      2. Add the Template to the correct Group Object (Device type, Sub type, Site, Category, Functional group)

Once corrected, perform a repoll of the device to make the change is effective.

Troubleshoot Discovery Polling Failures

After you confirm that the configurations in the template are correct, the next step is to confirm the configurations are getting applied to the device. Configurations from templates are applied to a device during a rediscovery. If configurations from templates are not applied to the device, check out the How to Troubleshoot Discovery Polling Failures section in our recent How To, Troubleshooting Alarms Not Recovering.

Incident Creation

Once we confirm configurations are applied and the correct Action Group is set, we need to confirm an Incident was created and that the notifications went out. In Netreo, for a notification to go out, an Incident must be created for the Alarms. Incidents are created only when enough soft critical alarms occur that create a hard critical alarm. This is configurable, but the default is typically an incident will get created after 3 soft alarms.

When an Alarm is triggered in Netreo and an Action Group is associated with the object that is alarming, an Incident is created and the assigned Action invoked. If an Incident has been created, follow the steps below to see if notifications were sent for that Incident.

When there was no notification received:

  1. Goto Quick Views -> Active Incidents
  1. Click SearchView
  2. Filter the results by adding information such as the device name in Title
    1. If you have an Incident ID, you can use that in the Incident ID field

  3. Once you have the Incident after searching, open the Incident and verify if Notifications were sent and what the lists are
  4. The Notification number is a link if greater than 0
    1. Clicking on that number shows you the history of the alerts sent and to whom

If you see notifications being sent for the Incident, validate that the email addresses being referenced are the correct addresses. If they are not correct, it could be an issue with templating.

Next, if you are getting too many notifications and you are seeing a large number of notification in the Incident, document your findings and reach out to Netreo

If no notifications are sent for an incident, it may be that the Action Groups being applied are incorrect or not configured correctly.

System Diagnostics

The overall health of the Netreo server can play a part in customers not receiving alerts. Navigate to the System Diagnostics page and escalate any issues you see there. Issues with critical system processes not running, such as Email, can be diagnosed via System Diagnostics. For further troubleshooting details, check out the Systems Diagnostics section in our recent How To, Troubleshooting Alarms Not Recovering.

Audit & Debug Logs

Another step to take is to review your logs. Netreo offers logs that provide Incident creation and Notification details. Reviewing the Audit Log, you can make sure no changes were made to the Device that could have prevented an Alert. Reviewing the Debug Log, you can confirm an Incident was created and Notification went out in the backend. For further troubleshooting advice, check out the Audit and Debug Logs section in our recent How To, Troubleshooting Alarms Not Recovering.

Mail Logs

Netreo sends emails when an issue arises for a managed device and can be configured to send scheduled emails for events or various reports. The System Mail Logs capture all actions related to sending emails. So, if you still haven’t discovered a cause for no alerts received, you’ll want to check if the email made it to its destination.

Using our System Mail logs, you can confirm if there is anything blocking or rejecting the actual email as it gets to each SMTP server or relay. Visibility into these logs varies depending on the Netreo platform. To access Netreo Mail Logs on a SaaS instance, go to Administration ->Alerts ->Mail. This will take you to a screen with the most recent mail logs:

To see additional logs or filter the results, hit on “More…” at the bottom left of the page:

To access Netreo Mail Logs on an On-Prem Netreo instance, go to Administration -> System -> System Logs:

Click on Get Logs to see that last 25 Mail Log entries:

No Alerts Received

The Mail Log contains entries for each email notification sent by the system and provides Filters to expedite troubleshooting. Use the email address in question, or subject lines, such as a scheduled report or Threshold – as the example below demonstrates:

This output shows the email messages sent that contained the word Threshold. When errors occur in the Mail Log, the system includes a unique identifier in brackets – 1441074 in the example above.

Changing the filter, to filter on that identifier, returns the entire string of that email message. You can use this output to determine if the email was sent, if it failed and in some cases, why the email failed to send.

The very last line of the output shows that the email was sent from our system successfully: status=sent. Unfortunately, this means your problem lies elsewhere, so you’re better off contacting Netreo Support Services.

If you see a failure message like the one below, the problem is likely with your email system or an incorrect email address. Double check that you have the correct email address, and if not, capture the error message and correct email address and contact Netreo Support Services.

That’s a Wrap

The issue of no alerts received can have many causes. But you do have shortcuts. When I see that an Action Group is associated with the incident, I go straight to Mail Logs and check the SMTP status. When email is your primary tool for alert notifications, this can save you a lot of time.

Otherwise, following the steps above should get alert notifications back on track. Of course, never hesitate to reach out to Netreo Support Services whenever you’re troubleshooting alerts on your own, or any other issue you have. Your success is our success, and we’re always glad to help.

To see how the Netreo Platform delivers maximum value as your infrastructure management solution, Request a Demo Today!

Ready to get started?

Get in touch or schedule a demo

Get Started Learn More