Frequently asked questions for InfluxDB Enterprise
Log errors
- Why am I seeing a
503 Service Unavailable
error in my meta node logs? - Why am I seeing a
409
error in some of my data node logs? - Why am I seeing
hinted handoff queue not empty
errors in my data node logs? - Why am I seeing
error writing count stats ...: partial write
errors in my data node logs? - Why am I seeing
queue is full
errors in my data node logs? - Why am I seeing
unable to determine if "hostname" is a meta node
when I try to add a meta node withinfluxd-ctl join
?
Other
Where can I find InfluxDB Enterprise logs?
On systemd operating systems service logs can be accessed using the journalctl
command.
Meta: journalctl -u influxdb-meta
Data : journalctl -u influxdb
Enterprise console: journalctl -u influx-enterprise
The journalctl
output can be redirected to print the logs to a text file. With systemd, log retention depends on the system’s journald settings.
Why am I seeing a 503 Service Unavailable
error in my meta node logs?
This is the expected behavior if you haven’t joined the meta node to the
cluster.
The 503
errors should stop showing up in the logs once you
join the meta node to the cluster.
Why am I seeing a 409
error in some of my data node logs?
When you create a
Continuous Query (CQ)
on your cluster every data node will ask for the CQ lease.
Only one data node can accept the lease.
That data node will have a 200
in its logs.
All other data nodes will be denied the lease and have a 409
in their logs.
This is the expected behavior.
Log output for a data node that is denied the lease:
[meta-http] 2016/09/19 09:08:53 172.31.4.132 - - [19/Sep/2016:09:08:53 +0000] GET /lease?name=continuous_querier&node_id=5 HTTP/1.2 409 105 - InfluxDB Meta Client b00e4943-7e48-11e6-86a6-000000000000 380.542µs
Log output for the data node that accepts the lease:
[meta-http] 2016/09/19 09:08:54 172.31.12.27 - - [19/Sep/2016:09:08:54 +0000] GET /lease?name=continuous_querier&node_id=0 HTTP/1.2 200 105 - InfluxDB Meta Client b05a3861-7e48-11e6-86a7-000000000000 8.87547ms
Why am I seeing hinted handoff queue not empty
errors in my data node logs?
[write] 2016/10/18 10:35:21 write failed for shard 2382 on node 4: hinted handoff queue not empty
This error is informational only and does not necessarily indicate a problem in the cluster. It indicates that the node handling the write request currently has data in its local hinted handoff queue for the destination node. Coordinating nodes will not attempt direct writes to other nodes until the hinted handoff queue for the destination node has fully drained. New data is instead appended to the hinted handoff queue. This helps data arrive in chronological order for consistency of graphs and alerts and also prevents unnecessary failed connection attempts between the data nodes. Until the hinted handoff queue is empty this message will continue to display in the logs. Monitor the size of the hinted handoff queues with ls -lRh /var/lib/influxdb/hh
to ensure that they are decreasing in size.
Note that for some write consistency settings, InfluxDB may return a write error (500) for the write attempt, even if the points are successfully queued in hinted handoff. Some write clients may attempt to resend those points, leading to duplicate points being added to the hinted handoff queue and lengthening the time it takes for the queue to drain. If the queues are not draining, consider temporarily downgrading the write consistency setting, or pause retries on the write clients until the hinted handoff queues fully drain.
Why am I seeing error writing count stats ...: partial write
errors in my data node logs?
[stats] 2016/10/18 10:35:21 error writing count stats for FOO_grafana: partial write
The _internal
database collects per-node and also cluster-wide information about the InfluxDB Enterprise cluster. The cluster metrics are replicated to other nodes using consistency=all
. For a write consistency of all
, InfluxDB returns a write error (500) for the write attempt even if the points are successfully queued in hinted handoff. Thus, if there are points still in hinted handoff, the _internal
writes will fail the consistency check and log the error, even though the data is in the durable hinted handoff queue and should eventually persist.
Why am I seeing queue is full
errors in my data node logs?
This error indicates that the coordinating node that received the write cannot add the incoming write to the hinted handoff queue for the destination node because it would exceed the maximum size of the queue. This error typically indicates a catastrophic condition for the cluster - one data node may have been offline or unable to accept writes for an extended duration.
The controlling configuration settings are in the [hinted-handoff]
section of the file. max-size
is the total size in bytes per hinted handoff queue. When max-size
is exceeded, all new writes for that node are rejected until the queue drops below max-size
. max-age
is the maximum length of time a point will persist in the queue. Once this limit has been reached, points expire from the queue. The age is calculated from the write time of the point, not the timestamp of the point.
Why am I seeing unable to determine if "hostname" is a meta node
when I try to add a meta node with influxd-ctl join
?
Meta nodes use the /status
endpoint to determine the current state of another metanode. A healthy meta node that is ready to join the cluster will respond with a 200
HTTP response code and a JSON string with the following format (assuming the default ports):
"nodeType":"meta","leader":"","httpAddr":"<hostname>:8091","raftAddr":"<hostname>:8089","peers":null}
If you are getting an error message while attempting to influxd-ctl join
a new meta node, it means that the JSON string returned from the /status
endpoint is incorrect. This generally indicates that the meta node configuration file is incomplete or incorrect. Inspect the HTTP response with curl -v "http://<hostname>:8091/status"
and make sure that the hostname
, the bind-address
, and the http-bind-address
are correctly populated. Also check the license-key
or license-path
in the configuration file of the meta nodes. Finally, make sure that you specify the http-bind-address
port in the join command, e.g. influxd-ctl join hostname:8091
.