Alarm Filters

Alarm Filters are used to determine which events trigger notifications and to define which channels should receive those notifications.

Alarm filters have the following fields:

Field NameDescription
NameThe name should be unique.
DescriptionThe description is displayed in the alarm filters table.
EnabledAn alarm filter must be enabled for matching event to be sent to the selected channel. Deselecting the check box can be handy if you wish to suppress a specific type of alarm.
ChannelsThis section determines which channels matching alarms will be sent to.
Criteria

The criteria determine which events will match the filter. These conditions can be set as:

  • All (default) - All specified criteria must be true to match. Equivalent to a boolean AND condition.
  • Any - Only one criteria must be true to match. Equivalent to a boolean OR condition.
  • None - The specified criteria must be false to match. Equivalent to a boolean NOT of the criteria ANDed together.
Node NameThe “Node Name” criteria llows you to select one or more specific node names. Note, even if the filter is set to All, the filter will match any of the selected node names is associated with the event.
Event TypeThe “Event Type” criteria determines which events will match the filter. Note, even if the filter is set to All, the filter will match any of the selected event types.
Tag Matches

The “Tag Matches” criteria allows you to use tag name/value pairs to determine if the filter should match events. For examples, you may what production devices to send to a high priority channel such as PagerDuty or OpsGenie. If your nodes have a tag to indicating “prod_status=production”, you can select that name/value pair from the list to properly filter your alarms.

Tag Match Any/All

You can choose if multiple tags must match ALL or ANY of the selected tag criteria for the filter to match. For example:

  • any would cover a scenario where you want to match say Environment=Prod OR Environment=Test.
  • all would cover if you wanted a filter to match something like Environment=Prod AND Region=EAST
Severity

Each event type has a severity level associated with it. This filter will match any event with the selected severity type or higher. This is the only mandatory criteria.

The severity levels are:

  1. INFO
  2. WARNING
  3. ERROR
  4. CRITICAL

For example, if you select the severity level of WARNING the filter will match WARNING, ERROR and CRITICAL events.

Some events have a corresponding event that will automatically resolve the alert in the portal and in some channels such as PagerDuty. The corresponding event may have a different severity level, so make sure you select the lower severity for the criteria. e.g. Node Disconnect is a WARNING but Node Connect which resolves it is only INFO. So you’d need to select both Event Types and set the severity to INFO.

Contains Text

This field will accept any single string of text to match to the contents of an event. For example, if all your gateways include -gw in the name you could enter that without quotes in the field and it would match any event that includes that text in the event payload. This criteria can also be used if there is another aspect of the node included in the event payload that doesn’t match the criteria above. To see the entire payload of an event configure a less specific payload and send to an email channel to see the JSON.

The event payload includes the node’s unique identifier (UID) which is a string of generated text and numbers. If your “Contains Text” criteria is too short, there is a chance a node UID will also match unexpectedly.

CEL ExpressionCEL expressions allow logical expressions that evaluate to true or false to determine if a filter should match an alarm. See here for a detailed explanation.

CEL Expressions

CEL (Common Expression Language) is a simple expression language that allows more complex tests than simple equality checks.

When an alarm filter has a CEL expression set, it will be compiled when saved. If the compilation fails, a validation error will appear at the top of the page.

CEL expressions allow numerical comparisons, arithmetic, boolean operators, regular expressions, string matching, presence testing and list evaluation.

Events are provided inside a ctx object.

{
    "details": {
        "alertId": "40e4a030-440a-4703-b33a-172416da4be2"
    },
    "domain": "demo.dev.trustgrid.io",
    "eventType": "Data Plane Disruption",
    "expires": 1699234335,
    "level": "WARNING",
    "message": "Node demo-node via Internet path abnormally disconnected",
    "nodeId": "ccd5a29e-fdc0-43d6-9408-b4184100287e",
    "nodeName": "demo-node",
    "node": {
        "uid": "59838ae6-a2b2-4c45-b7be-9378f0b265f5",
        "org": "aad89024-5927-4ebd-97e2-3cc605c1da5f",
        "domain": "dev.dev.trustgrid.io",
        "fqdn": "demo-node.demo.dev.trustgrid.io",
        "lastip": "64.17.3.164",
        "last_connect": 1699158287000,
        "name": "demo-node",
        "state": "ACTIVE",
        "cluster": "",
        "tags": {
            "autoupdate": "true",
        },
        "online": true,
        "shadow": {
            "reported": {
            "nic.ens160.duplex": "full",
            "node-core.version": "20231103-171711.d16963a",
            "node.upgrade.state": "COMPLETED",
            "repoConnectivity": "true",
            "dnsResolution": "healthy",
            "nic.ens160.mac": "00:50:56:8e:8a:03",
            "ztna-enabled": "true",
            "nic.ens192.mtu": "1500",
            "profile.name": "default",
            "nic.ens192.speed": "10000",
            "ssh.local": "false",
            "os.distro.id": "ubuntu",
            "nic.ens192.dhcp": "false",
            "netplan.saved": "true",
            "nic.ens192.ip": "10.20.10.50/24",
            "nic.ens192.duplex": "full",
            "nic.ens160.speed": "10000",
            "publishTime": 1699210399513,
            "package.version": "1.5.20231103-1880",
            "os.distro.version": "18.04.3 LTS (Bionic Beaver)",
            "domain.info.lastUpdate": 1699145917,
            "nic.ens192.gateway": "10.20.10.1",
            "os.arch": "amd64",
            "updateTime.enabled": "true",
            "nic.ens160.dns1": "172.16.11.4",
            "nic.ens160.mtu": "1500",
            "nic.ens160.ip": "172.16.22.50/24",
            "nic.ens160.gateway": "172.16.22.1",
            "nic.ens192.mac": "00:50:56:8e:c9:74",
            "version": 1699145622,
            "kvm-enabled": "false",
            "tpm.enabled": "false",
            "node.upgrade.completed.tstamp": 1699034170,
            "startup.error": "true",
            "nic.ens160.dhcp": "false"
            }
        },
        "device": {
            "mac": "00:50:56:8E:AA:28",
            "model": "esx",
            "vendor": "vmware"
        },
        "location": {
            "continent_name": "North America",
            "zip": "80301",
            "calling_code": null,
            "city": "Boulder",
            "ip": "64.17.3.164",
            "latitude": 40.04801940917969,
            "continent_code": "NA",
            "type": "ipv4",
            "country_code": "US",
            "country_flag_emoji_unicode": null,
            "country_name": "United States",
            "is_eu": false,
            "connection": {
                "asn": 27325,
                "isp": "Zcolo"
            },
            "country_flag_emoji": null,
            "location": {
                "Languages": [
                    {
                    "name": "English",
                    "native": "English",
                    "code": "en"
                    }
                ],
                "capital": "Washington D.C.",
                "geoname_id": 5574991
            },
            "region_name": "Colorado",
            "country_flag": null,
            "longitude": -105.20680236816406,
            "region_code": "CO"
        },
        "type": "Node",
        "tgrn": "tgrn:tg::nodes:node/59838ae6-a2b2-4c45-b7be-9378f0b265f5",
        "created_at": 1552940922,
        "config": {
            "cluster": {
                "master": true
            },
            "gateway": {
                "clients": [],
                "monitorHops": true,
                "maxmbps": 1001,
                "cert": "proxy.dev.trustgrid.io",
                "type": "private",
                "connectToPublic": true,
                "udpPort": 8443,
                "enabled": false,
                "master": false,
                "port": 8442,
                "paths": [],
                "udpEnabled": true,
                "host": "12.244.52.245",
                "maxClientWriteMbps": 1000
            },
            "snmp": {
                "port": 161,
                "interface": "ens160",
                "authProtocol": "SHA",
                "enabled": true,
                "privacyProtocol": "DES",
                "engineId": "7779cf92165b42f380fc9c93c",
                "username": "myuser"
            }
        }
    },
    "orgId": "aad89024-5927-4ebd-97e2-3cc605c1da5",
    "receivedTime": 1699147935,
    "subject": "Node",
    "timestamp": 1699147935,
    "_ct": "2023-11-05T01:32:15.471Z",
    "_md": "2023-11-05T01:32:15.471Z"
}

Nested values can be referenced using a ..

Some common tests include:

  • Check if a node is a gateway: has(ctx.node.config.gateway) && ctx.node.config.gateway.enabled
  • Check if a node has production in the name: ctx.node.name.contains("prod")
  • Check if a node is not clustered: !has(ctx.node.cluster) || ctx.node.cluster == ""
  • Check if a node is in Texas or Colorado: ctx.node.location.region_code == "TX" || ctx.node.location.region_code == "CO"
  • Check if a node is a virtual machine: ctx.node.device.vendor == "vmware"
  • Check if a node is up to date: ctx.node.shadow.reported["package.version"] >= "1.5.20231103-1880"

The full CEL definition can be found at GitHub.

You can use this CEL playground to test out expressions.