Event Types
6 minute read
This page lists all event types that can be generated by Trustgrid nodes. Events are alphabetized and grouped by their filter type. For events that have both trigger and resolution states, the trigger severity and resolution severity are shown.
Contents
- All Gateways Disconnected
- All Peers Disconnected
- BGP Peer Connectivity
- Certificate Expiring
- Cluster Failover
- Configuration Update Failure
- Connection Flapping
- Connection Timeout
- Data and Control Plane Disruption
- Data Plane Disruption
- Deregister
- DNS Resolution
- Gateway Connectivity Health Check
- Gateway Ingress Limit Reached
- Gateway UDP Tunnel Error
- Metric Threshold Violation
- Network Error
- Network Health Check
- Networking Framework Memory Management
- Node Connect/Disconnect
- Node Delete
- Node Stop Error
- Order Commented
- Order Created
- Repo Connectivity
- SSH Lockdown
- Unauthorized IP
Event Types
All Gateways Disconnected
Trigger Severity: WARNING
Description: This event will be triggered if an edge node loses connectivity to all its available gateways.
All Peers Disconnected
Trigger Severity: WARNING
Description: This event will be triggered if a node is configured as a gateway and it loses connectivity to all its configured edge nodes.
BGP Peer Connectivity
Trigger Severity: ERROR
Resolution Severity: INFO
Trigger Message: BGP peer
Resolution Message: BGP peer
Description: This event is triggered when a BGP peer disconnects, and resolved when the BGP peer reconnects.
Certificate Expiring
Trigger Severity: WARNING
Description: Alerts when a certificate uploaded via Portal → Certificates will expire within 3 months.
Cluster Failover
Severity: INFO
Messages:
Description: Sent by a node when it claims or releases the active role in a cluster.
Configuration Update Failure
Severity: INFO
Message: Unexpected error pulling the most recent configuration from the cloud endpoint
Description: Indicates the node is unable to connect to the configuration REST API endpoint within the Trustgrid Control Plane. Verify all required communication is allowed to Control Plane.
Connection Flapping
Trigger Severity: WARNING
Resolution Severity: INFO
Description:
- Trigger: Alerts when a node disconnects and reconnects 10 times within 5 minutes. A follow up alert will be sent after each subsequent 120 disconnect/reconnects.
- Resolution: This alert will be sent when a node’s “Connection Flapping” issue has been resolved.
Connection Timeout
Trigger Severity: ERROR
Description: Alerts when a node does not reconnect after a profile update has been pushed to the node.
Data and Control Plane Disruption
Trigger Severity: ERROR
Message: Data and Control plane connection is up/down
Description: Generated when a node cannot establish a connection to the Trustgrid control plane and all its configured data plane gateways. If clustered this will mark the node as unhealthy triggering a cluster failover.
Data Plane Disruption
Trigger Severity: WARNING
Messages:
- (No specific message for unexpected tunnel termination)
- Gateway nodename removed from the domain
Description:
- This event will be triggered if a data plane tunnel is terminated unexpectedly.
- If a node acting as a gateway (public, private or hub) is disabled or the gateway service is disabled all other nodes in the domain will log this event.
Deregister
Severity: INFO
Message: Device was deregistered from the console
Description: Event is sent if a user with console access runs the deregistration process
DNS Resolution
Trigger Severity: ERROR
Message: DNS resolution failed/re-established
Description: Once per hour the node will attempt to resolve a trustgrid.io DNS address using the configured DNS servers. This event is triggered if the node is unable to resolve the requested address.
Gateway Connectivity Health Check
Trigger Severity: CRITICAL
Resolution Severity: INFO
Description:
- Trigger: Alerts when an edge node is unable to communicate with a gateway node.
- Resolution: Alerts when an edge node reestablishes connectivity to a gateway node after a failure is reported.
Gateway Ingress Limit Reached
Trigger Severity: ERROR
Description: Alerts when a gateway node’s ingress limit is above 95 percent utilization for two minutes straight.
Gateway UDP Tunnel Error
Trigger Severity: ERROR
Resolution Severity: INFO
Trigger Message: UDP Tunnel has timed out for node=<peer-node>
and endpoint=<peer-ip>:<peer-port>
Resolution Message: UDP Tunnel connection has been re-established for node=<peer-node>
and endpoint=<peer-ip>:<peer-port>
Description:
- Trigger: Alerts when a UDP tunnel times out after not receiving the keep alive packet for 2 minutes (default gateway timeout).
- Resolution: Event generated when a previously disconnected UDP tunnel is re-established and traffic should flow through it again.
Metric Threshold Violation
Trigger Severity: ERROR
Resolution Severity: INFO
Description:
- Trigger: Alerts when a node cpu, ram, disk, or latency configured metric threshold is violated.
- Resolution: Alerts when a previously reported threshold violation has been cleared.
Network Error
Severity Levels: CRITICAL, ERROR, WARNING
Messages and Descriptions:
- CRITICAL: Stale ARP detected - Alerts when the active node in a cluster detects another MAC address responding to ARPs for a configured cluster IP address. This can occur briefly during a failover if the standby node begins arping before the previous active node releases that role. Other causes include an IP conflict, proxy ARP configured on another device on that network, or the attached switch not updating its ARP cache.
- ERROR: Unable to create/update Azure IP configuration for nic=LAN. Error Code=<Azure error code> - Alerts when a node is unable to create or update an Azure Cluster IP configuration. The error from the Azure API is included in the event.
- WARNING: Interface {OS interface name} is running with half-duplex - Alerts if an interface has been detected running at half-duplex. This is almost always a result of a failure to auto-negotiate the speed/duplex and can result in poor performance.
Network Health Check
Trigger Severity: ERROR
Message: Interface {OS interface name} is down / All interfaces are up
Description: Generated when link is lost on a configured interface. This is generated even if the interfaces is set with the “Ignore Health Check” setting enabled. Note: Interfaces with APIPA addresses are ignored.
Networking Framework Memory Management
Trigger Severity: ERROR
Message: The networking framework has exhausted all allocated memory. Please contact support@trustgrid.io to notify of this issue.
Description: Alerts if the Java Virtual Machine the Trustgrid node service uses runs out of available memory.
Node Connect/Disconnect
Trigger Severity: WARNING (Disconnect)
Resolution Severity: INFO (Connect)
Description:
- Trigger (Disconnect): Alerts when a node disconnects from the control plane.
- Resolution (Connect): Alerts when a node connects to the control plane.
Node Delete
Severity: WARNING
Message: Node deleted
Description: Event generated whenever a node is deleted.
Node Stop Error
Severity: ERROR
Message: Failed to stop the Node service cleanly
Description: Indicates that the Trustgrid service did not stop normally prior to this instance starting.
Order Commented
Severity: INFO
Description: Alerts when a provisioning order case has been commented.
Order Created
Severity: INFO
Description: Alerts when a new provisioning order has been created.
Repo Connectivity
Trigger Severity: ERROR
Resolution Severity: INFO
Trigger Message: Repo connectivity failed
Resolution Message: Repo connectivity re-established
Description:
- Trigger: Alerts when a node cannot connect to the Trustgrid update repository.
- Resolution: Alerts when a node re-establishes connectivty to the Trustgrid update repository. This event clears the Repo Connectivity error alert.
SSH Lockdown
Trigger Severity: ERROR
Resolution Severity: INFO
Trigger Message: SSH allowing connections from non local address and port
Resolution Message: SSH listening only on local address and port
Description:
- Trigger: Alerts when SSH on an appliance-based node is configured to listen on any IP other than local host (127.0.0.1).
- Resolution: Alerts when SSH on an appliance-based node is properly locked down. This event clears the SSH Lockdown error alert.
Unauthorized IP
Trigger Severity: WARNING
Description: Alerts when a node’s public IP has been locked but the connection to the control plane comes from a different IP.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.