Canarytrace Listener

What you’ll learn

  • What is a Canarytrace Listener
  • How Listener work

Canarytrace Listener is an additional component that helps to select incidents or exceeded of thresholds from a lot of data that Canarytrace collects. You have graphs, dashboards and a lot of data that canarytrace has collected, but thanks to tunnel vision, an incident or threshold crossing can easily be overlooked.

How does it work?

  • Listener brings automated monitoring of thresholds exceedances and alerts according set of rules.
  • Listener provides a set of built-in rules with recommended values for metrics such as ResponseTimes, LoadEventEnd, LongTasks, Web Vitals, sizing and format of images, and etc.
  • Listener evaluates problems that the browser downloads from the server, such as downloading javascript or css files without compression.
  • Listener works with a list of rules, each rule has a set threshold, reporter and score. If the threshold is exceeded, Listener send alert by the selected reporter.
  • Each threshold has set of score. The table of score allows you to finely define very softly the severity of the incident. The score has a range from 0 to 100. The lower the better.

Rule example

Send notification into our slack channel, when any response has responseTime is greater than 3000ms.

    - type: range
      title: "Higher response time."
      index: c.performance-entries
      timeRange: now-1h
      field: responseTime
      operator: gte
      value: 3000
      min: 10
      reportLabels:
      - 'name'
      - 'responseTime'
      - 'timestamp'
      reporters:
      - type: slack
        score: 40
        message: "Some of responses has higher response time."

Score table

DescriptionScoreColor
needs fix!0-30red
needs improvement!31-70orange
good job!71-100green

Rules

Rules are the cornerstone of the Canarytrace Listener and there are three types:

  • Internal rules - are built-in healthchecks for canarytrace and elasticsearch.
  • Default rules - built-in rules for monitoring thresholds of metrics and recommendations.
  • Client rules - we are preparing - the user will be able to set rules, thresholds and reporters himself.

Rule match

  • Search exactly value in a label
  • Condition: index c.report in the last hour must contains false in the label passed and minimal 2 finds.
  • If the condition is met, send the report to slack and events reporters.
  • You can use operator must and must_not.
  • min is optional.
  • reporters can be one or more.
- type: match
  title: "Failed check your page!"
  index: c.report
  timeRange: now-1h
  field: passed
  operator: must
  expected: false
  min: 2
  reportLabels:
  - 'fullTitle'
  - 'timestamp'
  reporters:
  - type: slack
    score: 10
    message: "An error occurred while checking."
  - type: events
    score: 10
    message: "An error occurred while checking."

Rule contains

  • First condition: get a group of data by field and value and second condition: search exactly expression.field and expression.values
  • Condition: index c.response in the last hour with label headers.content-type which contains javascript not contains gzip or br in the field headers.content-encoding and minimal 10 finds.
  • If the condition is met, send the report to slack and events reporters.
  • You can use operator must and must_not.
  • min is optional.
  • expression.value can be one or more.
  • reporters can be one or more.
- type: contains
  title: "Encoding of response with Javascript files must contains gzip or brotli compression."
  index: c.response
  timeRange: now-1h
  field: headers.content-type
  value: 'javascript'
  expression:
    c: 'headers.content-encoding'
    operator: must_not
    values:
    - 'gzip'
    - 'br'
  min: 10
  reportLabels:
  - 'url'
  - 'timestamp'
  reporters:
  - type: slack
    score: 40
    message: "Use Brotli for plain text compression."
  - type: events
    score: 40
    message: "Use Brotli for plain text compression."

Rule range

  • Search greater or less value
  • Condition: index c.performance-entries in the last hour must be responseTime greater than 3000 and minimal 10 finds.
  • If the condition is met, send the report to slack and events reporters.
  • You can use operator gte and lte.
  • min is optional.
  • reporters can be one or more.
- type: range
  title: "Higher response time."
  index: c.performance-entries
  timeRange: now-1h
  field: responseTime
  operator: gte
  value: 3000
  min: 10
  reportLabels:
  - 'name'
  - 'responseTime'
  - 'timestamp'
  reporters:
  - type: slack
    score: 40
    message: "Some of responses has higher response time."
  - type: events
    score: 40
    message: "Some of responses has higher response time."

Default rules

  • Latest version is 1.6
  • This table of rules is still under development

Latest version of Listener contains these built-in rules

#titleIndexConditionMin count /hourScore
1Failed check your page!c.reporttest step failed210
2Encoding of response with Javascript files must contains gzip or brotli compression.c.responsegzip or br missing in headers.content-encoding1040
3Encoding of response with CSS files must contains gzip or brotli compression.c.responsegzip or br missing in headers.content-encoding1040
4Higher response time.c.performance-entries> 3000ms1040
5WebVitals LCP exceeded.c.audit> 2500ms540
6WebVitals TTI exceeded.c.audit> 5000ms540
7WebVitals CLS exceeded.c.audit> 0.1540
8LoadEventEnd exceeded.c.performance-entries> 4000ms540
  • title name of rule
  • index where Listener is looking for incidents
  • condition if the condition is met, an report will be sent
  • min count / hour an report will be sent, if is min count of incidents found
  • score reporter ma priznak a barvu podle score

Internal rules

Internal checkScoreDescription
⚙️ Canarytrace is down0Canarytrace did not send data to c.report-* index
⚙️ Elasticsearch health is yellow50Need improvements
⚙️ Elasticsearch health is red0Elasticsearch is probably down
⚙️ Java Heap is higher than 70%50Is close to overloaded and need improvements
⚙️ Java Heap is higher than 85%50Elasticsearch nodes is overloaded
  • canaryIsLive canarytrace run
  • healthCheck elasticsearch status and pending tasks
  • nodesCheck elasticsearch java heap, max ram, max disk used, max cpu

Reporters

  • You can use on or more reporters
reporters:
- type: slack
  score: 10
  message: "An error occurred while checking."
- type: events
  score: 10
  message: "An error occurred while checking."
- type: email
  score: 10
  message: "An error occurred while checking."
  recipients:
  - 'rdpanek@canarytrace.com'
  - 'support@company.com'

Slack reporter

Slack report

Slack report

Email reporter

Email report

Events reporter

Kibana index

{
  "type": "events",
  "rule": "cldr",
  "tags": [
    "webApp",
    "monitoring"
  ],
  "kibana_endpoint": "https://abc.gcp.cloud.es.io:9243",
  "score": 40,
  "title": "WebVitals LCP exceeded.",
  "description": "Point in the page load timeline when the page's main content has likely loaded, https://web.dev/lcp/, total 10 exceeded.",
  "documents": [
    {
      "_index": "c.audit-2021.03.09",
      "_id": "15K_F3gBTSVqKz9Gc_EY",
      "timestamp": "2021-03-09T16:08:45.962Z"
    },
    {
      "_index": "c.audit-2021.03.09",
      "_id": "cZK-F3gBTSVqKz9G8fFp",
      "timestamp": "2021-03-09T16:08:12.763Z"
    },
    {
      "_index": "c.audit-2021.03.09",
      "_id": "u5K-F3gBTSVqKz9GUvAc",
      "timestamp": "2021-03-09T16:07:32.057Z"
    },
    {
      "_index": "c.audit-2021.03.09",
      "_id": "FpK9F3gBTSVqKz9GxvCt",
      "timestamp": "2021-03-09T16:06:56.359Z"
    },
    {
      "_index": "c.audit-2021.03.09",
      "_id": "YJK9F3gBTSVqKz9GJ--j",
      "timestamp": "2021-03-09T16:06:15.564Z"
    },
    {
      "_index": "c.audit-2021.03.09",
      "_id": "eJK6F3gBTSVqKz9Gqu7h",
      "timestamp": "2021-03-09T16:03:32.563Z"
    },
    {
      "_index": "c.audit-2021.03.09",
      "_id": "df-6F3gB48LFUj19I2XP",
      "timestamp": "2021-03-09T16:02:58.065Z"
    },
    {
      "_index": "c.audit-2021.03.09",
      "_id": "Iv-5F3gB48LFUj19omXh",
      "timestamp": "2021-03-09T16:02:25.057Z"
    },
    {
      "_index": "c.audit-2021.03.09",
      "_id": "iP-5F3gB48LFUj19I2SC",
      "timestamp": "2021-03-09T16:01:52.453Z"
    },
    {
      "_index": "c.audit-2021.03.09",
      "_id": "o5K4F3gBTSVqKz9Ggu2f",
      "timestamp": "2021-03-09T16:01:11.254Z"
    }
  ],
  "timestamp": "2021-03-09T16:10:11+00:00"
},
"fields": {
  "documents.timestamp": [
    "2021-03-09T16:08:45.962Z",
    "2021-03-09T16:08:12.763Z",
    "2021-03-09T16:07:32.057Z",
    "2021-03-09T16:06:56.359Z",
    "2021-03-09T16:06:15.564Z",
    "2021-03-09T16:03:32.563Z",
    "2021-03-09T16:02:58.065Z",
    "2021-03-09T16:02:25.057Z",
    "2021-03-09T16:01:52.453Z",
    "2021-03-09T16:01:11.254Z"
  ],
  "timestamp": [
    "2021-03-09T16:10:11.000Z"
  ]
}

CLI

  • Canarytrace Listener settings by environment variables

Elasticsearch

  • ELASTIC_REQUEST_TIMEOUT default 1000ms
  • ELASTIC_CLUSTER default http://localhost:9200
  • ELASTIC_HTTP_AUTH default no
  • INDEX_PREFIX default is c.
  • QUERY_SIZE default is 20, lower is better for performance
  • KIBANA_ENDPOINT link to kibana

Slack

  • SLACK_WEBHOOK_URL for clients slack

Email All params are required

  • EMAIL_SMTP_SERVER
  • EMAIL_SMTP_PORT default 465
  • EMAIL_SMTP_USER
  • EMAIL_SMTP_PASS

Healthcheck

  • CANARY_HEALTHCHECK default allow
  • ELASTIC_HEALTHCHECK default allow
  • ELASTIC_NODES_HEALTHCHECK default allow

Tags

  • TAGS example eshop;monitoring

How to run Canarytrace Listener

  • Canarytrace is dockerized component with complete Kubernetes objects.
  • Kubernetes deployments are not public.
  • namespace.yaml - create a canarytrace namespace.
  • config-rules.yaml - collection of rules.
  • secret.yaml - contains all sensitive configuration (elasticsearch, slack webhook, smtp user etc.)
  • listener.yaml - main deployment script of Canarytrace Listener / CronJob.