Event Types

This page lists all event types that can be generated by Trustgrid nodes. Events are alphabetized and grouped by their filter type. For events that have both trigger and resolution states, the trigger severity and resolution severity are shown.

Contents

Event Types

All Gateways Disconnected

Trigger Severity: WARNING

Description: This event will be triggered if an edge node loses connectivity to all its available gateways.

All Peers Disconnected

Trigger Severity: WARNING

Description: This event will be triggered if a node is configured as a gateway and it loses connectivity to all its configured edge nodes.

BGP Peer Connectivity

Trigger Severity: ERROR
Resolution Severity: INFO

Trigger Message: BGP peer has disconnected
Resolution Message: BGP peer has re-connected

Description: This event is triggered when a BGP peer disconnects, and resolved when the BGP peer reconnects.

Certificate Expiring

Trigger Severity: WARNING

Description: Alerts when a certificate uploaded via Portal → Certificates will expire within 3 months.

Cluster Failover

Severity: INFO

Messages:

Description: Sent by a node when it claims or releases the active role in a cluster.

Configuration Update Failure

Severity: INFO

Message: Unexpected error pulling the most recent configuration from the cloud endpoint

Description: Indicates the node is unable to connect to the configuration REST API endpoint within the Trustgrid Control Plane. Verify all required communication is allowed to Control Plane.

Connection Flapping

Trigger Severity: WARNING
Resolution Severity: INFO

Description:

  • Trigger: Alerts when a node disconnects and reconnects 10 times within 5 minutes. A follow up alert will be sent after each subsequent 120 disconnect/reconnects.
  • Resolution: This alert will be sent when a node’s “Connection Flapping” issue has been resolved.

Connection Timeout

Trigger Severity: ERROR

Description: Alerts when a node does not reconnect after a profile update has been pushed to the node.

Data and Control Plane Disruption

Trigger Severity: ERROR

Message: Data and Control plane connection is up/down

Description: Generated when a node cannot establish a connection to the Trustgrid control plane and all its configured data plane gateways. If clustered this will mark the node as unhealthy triggering a cluster failover.

Data Plane Disruption

Trigger Severity: WARNING

Messages:

  • (No specific message for unexpected tunnel termination)
  • Gateway nodename removed from the domain

Description:

  • This event will be triggered if a data plane tunnel is terminated unexpectedly.
  • If a node acting as a gateway (public, private or hub) is disabled or the gateway service is disabled all other nodes in the domain will log this event.

Deregister

Severity: INFO

Message: Device was deregistered from the console

Description: Event is sent if a user with console access runs the deregistration process

DNS Resolution

Trigger Severity: ERROR

Message: DNS resolution failed/re-established

Description: Once per hour the node will attempt to resolve a trustgrid.io DNS address using the configured DNS servers. This event is triggered if the node is unable to resolve the requested address.

Gateway Connectivity Health Check

Trigger Severity: CRITICAL
Resolution Severity: INFO

Description:

  • Trigger: Alerts when an edge node is unable to communicate with a gateway node.
  • Resolution: Alerts when an edge node reestablishes connectivity to a gateway node after a failure is reported.

Gateway Ingress Limit Reached

Trigger Severity: ERROR

Description: Alerts when a gateway node’s ingress limit is above 95 percent utilization for two minutes straight.

Gateway UDP Tunnel Error

Trigger Severity: ERROR
Resolution Severity: INFO

Trigger Message: UDP Tunnel has timed out for node=<peer-node> and endpoint=<peer-ip>:<peer-port>
Resolution Message: UDP Tunnel connection has been re-established for node=<peer-node> and endpoint=<peer-ip>:<peer-port>

Description:

  • Trigger: Alerts when a UDP tunnel times out after not receiving the keep alive packet for 2 minutes (default gateway timeout).
  • Resolution: Event generated when a previously disconnected UDP tunnel is re-established and traffic should flow through it again.

Metric Threshold Violation

Trigger Severity: ERROR
Resolution Severity: INFO

Description:

  • Trigger: Alerts when a node cpu, ram, disk, or latency configured metric threshold is violated.
  • Resolution: Alerts when a previously reported threshold violation has been cleared.

Network Error

Severity Levels: CRITICAL, ERROR, WARNING

Messages and Descriptions:

  • CRITICAL: Stale ARP detected - Alerts when the active node in a cluster detects another MAC address responding to ARPs for a configured cluster IP address. This can occur briefly during a failover if the standby node begins arping before the previous active node releases that role. Other causes include an IP conflict, proxy ARP configured on another device on that network, or the attached switch not updating its ARP cache.
  • ERROR: Unable to create/update Azure IP configuration for nic=LAN. Error Code=<Azure error code> - Alerts when a node is unable to create or update an Azure Cluster IP configuration. The error from the Azure API is included in the event.
  • WARNING: Interface {OS interface name} is running with half-duplex - Alerts if an interface has been detected running at half-duplex. This is almost always a result of a failure to auto-negotiate the speed/duplex and can result in poor performance.

Network Health Check

Trigger Severity: ERROR

Message: Interface {OS interface name} is down / All interfaces are up

Description: Generated when link is lost on a configured interface. This is generated even if the interfaces is set with the “Ignore Health Check” setting enabled. Note: Interfaces with APIPA addresses are ignored.

Networking Framework Memory Management

Trigger Severity: ERROR

Message: The networking framework has exhausted all allocated memory. Please contact support@trustgrid.io to notify of this issue.

Description: Alerts if the Java Virtual Machine the Trustgrid node service uses runs out of available memory.

Node Connect/Disconnect

Trigger Severity: WARNING (Disconnect)
Resolution Severity: INFO (Connect)

Description:

  • Trigger (Disconnect): Alerts when a node disconnects from the control plane.
  • Resolution (Connect): Alerts when a node connects to the control plane.

Node Delete

Severity: WARNING

Message: Node deleted

Description: Event generated whenever a node is deleted.

Node Stop Error

Severity: ERROR

Message: Failed to stop the Node service cleanly

Description: Indicates that the Trustgrid service did not stop normally prior to this instance starting.

Order Commented

Severity: INFO

Description: Alerts when a provisioning order case has been commented.

Order Created

Severity: INFO

Description: Alerts when a new provisioning order has been created.

Repo Connectivity

Trigger Severity: ERROR
Resolution Severity: INFO

Trigger Message: Repo connectivity failed
Resolution Message: Repo connectivity re-established

Description:

  • Trigger: Alerts when a node cannot connect to the Trustgrid update repository.
  • Resolution: Alerts when a node re-establishes connectivty to the Trustgrid update repository. This event clears the Repo Connectivity error alert.

SSH Lockdown

Trigger Severity: ERROR
Resolution Severity: INFO

Trigger Message: SSH allowing connections from non local address and port
Resolution Message: SSH listening only on local address and port

Description:

  • Trigger: Alerts when SSH on an appliance-based node is configured to listen on any IP other than local host (127.0.0.1).
  • Resolution: Alerts when SSH on an appliance-based node is properly locked down. This event clears the SSH Lockdown error alert.

Unauthorized IP

Trigger Severity: WARNING

Description: Alerts when a node’s public IP has been locked but the connection to the control plane comes from a different IP.