Considerations for monitoring the InfluxData Platform

One of the primary use cases for InfluxData’s TICK stack is infrastructure monitoring, including using the TICK stack to monitor itself or another TICK stack. These are the two main approaches to Monitoring your TICK stack:

Internal monitoring - A TICK stack that monitors itself.
“Watcher of watchers” approach - A TICK stack monitored by another TICK stack.

Internal monitoring

Not recommended for production environments.

By default, the InfluxData platform is configured to monitor itself. Telegraf collects metrics from the host on which it’s running for things such as CPU usage, memory usage, disk usage, etc., and stores them in the telegraf database in InfluxDB. InfluxDB also reports performance metrics about itself, such as continuous query statistics, internal goroutine statistics, write statistics, series cardinality, and others, and stores them in the _internal database. For the recommendation about _internal databases, see Disable the _internal database in production clusters below.

Monitoring dashboards are available that visualize the default metrics provided in each of these databases. You can also configure Kapacitor alerts to monitor and alert on each of these metrics.

Pros of internal monitoring

Simple setup

Internal monitoring requires no additional setup or configuration changes. The TICK stack monitors itself out of the box.

Cons of internal monitoring

No hardware separation

When using internal monitoring, if your TICK stack goes offline, your monitor does as well. Any configured alerts will not be sent and you will not be notified of any issues. Because of this, internal monitoring is not recommended for production use cases.

The “watcher of watchers” approach

Recommended for production environments.

A “watcher of watchers” approach for monitoring InfluxDB OSS and InfluxDB cluster nodes offers monitoring of your InfluxDB resources while ensuring that the monitoring statistics are available remotely in case of data loss.

This usually takes the form of an Enterprise cluster being monitored by an OSS TICK stack. It consists of Telegraf agents installed on each node in your primary cluster reporting metrics for their respective hosts to a monitoring TICK stack installed on a separate server or cluster.

For information about setting up an external monitoring TICK stack, see Setup an external monitor.

Monitoring dashboards are available that visualize the default metrics provided by the Telegraf agents. You can also configure Kapacitor alerts to monitor and alert on each of these metrics.

Pros of external monitoring

Hardware separation

With a monitor running separate from your primary TICK stack, issues that occur in the primary stack will not affect the monitor. If your primary TICK stack goes down or has issues, your monitor will be able detect them and alert you.

Cons of external monitoring

Slightly more setup

There is more setup involved with external monitoring, but the benefits far outweigh the extra time required, especially for production use cases.

Recommendations

Disable the `_internal` database in production clusters

InfluxData does not recommend using the _internal database in a production cluster. It creates unnecessary overhead, particularly for busy clusters, that can overload an already loaded cluster. Metrics stored in the _internal database primarily measure workload performance, which should only be tested in non-production environments.

To disable the _internal database, set store-enabled to false under the [monitor] section of your influxdb.conf.

influxdb.conf

# ...
[monitor]

  # ...

  # Whether to record statistics internally.
  store-enabled = false

  #...

支持与反馈

欢迎来到我们这个大家庭！如对于产品和文档有任何反馈，请关注下列信息：

文档有问题？提Issue

Considerations for monitoring the InfluxData Platform

Internal monitoring

Pros of internal monitoring

Simple setup

Cons of internal monitoring

No hardware separation

The “watcher of watchers” approach

Pros of external monitoring

Hardware separation

Cons of external monitoring

Slightly more setup

Recommendations

Disable the _internal database in production clusters

支持与反馈

Where are you running InfluxDB?

InfluxDB OSS

Default

Custom

InfluxDB Cloud

AWS

GCP

Azure

InfluxDB OSS 2.0 release candidate

Disable the `_internal` database in production clusters