Etcd Operation

Dec 6, 2022 etcd golang

磁盘

The output reports whether the disk is fast enough to host etcd by comparing the 99th percentile of the fsync metric captured from the run to see if it is less than 10 ms. [输出报告磁盘是否足够快以托管 etcd，方法是比较从运行中捕获的 fsync 指标的第 99 个百分位，看它是否小于 10 毫秒。] Because etcd replicates the requests among all the members, its performance strongly depends on network input/output (I/O) latency. [因为 etcd 在所有成员之间复制请求，所以它的性能在很大程度上取决于网络输入/输出 (I/O) 延迟。] High network latencies result in etcd heartbeats taking longer than the election timeout, which results in leader elections that are disruptive to the cluster. [高网络延迟会导致 etcd 心跳花费的时间超过选举超时时间，从而导致领导者选举对集群造成破坏。] A key metric to monitor on a deployed OpenShift Container Platform cluster is the 99th percentile of etcd network peer latency on each etcd cluster member. [在部署的 OpenShift Container Platform 集群上监控的一个关键指标是每个 etcd 集群成员上 etcd 网络对等延迟的第 99 个百分位。] Use Prometheus to track the metric. [使用 Prometheus 跟踪指标。]

1podman run --volume /var/lib/etcd:/var/lib/etcd:Z quay.io/openshift-scale/etcd-perf

etcd_disk_wal_fsync_duration_seconds_bucket reports the etcd disk fsync duration, etcd_server_leader_changes_seen_total reports the leader changes. [etcd_disk_wal_fsync_duration_seconds_bucket 报告 etcd 磁盘 fsync 持续时间，etcd_server_leader_changes_seen_total 报告领导者变化。] To rule out a slow disk and confirm that the disk is reasonably fast, 99th percentile of the etcd_disk_wal_fsync_duration_seconds_bucket should be less than 10ms. [要排除慢速磁盘并确认磁盘相当快，etcd_disk_wal_fsync_duration_seconds_bucket 的第 99 个百分位应小于 10 毫秒。] Query in metrics UI: [在指标 UI 中查询：]

1histogram_quantile(0.99, sum by (instance, le) (irate(etcd_disk_wal_fsync_duration_seconds_bucket{job="etcd"}[5m])))

网络

因为 etcd 在所有成员之间复制请求，所以它的性能在很大程度上取决于网络输入/输出 (I/O) 延迟。高网络延迟会导致 etcd 心跳花费的时间超过选举超时时间，从而导致领导者选举对集群造成破坏。在部署的 OpenShift Container Platform 集群上监控的一个关键指标是每个 etcd 集群成员上 etcd 网络对等延迟的第 99 个百分位。使用 Prometheus 跟踪指标。 The histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[2m])) metric reports the round trip time for etcd to finish replicating the client requests between the members. Ensure that it is less than 50 ms.

碎片化状态

In the case of slow fisk or when the etcd DB size increases, we can defragment existing etcd DB to optimize DB consumption as described in here. [在慢速 fisk 或 etcd 数据库大小增加的情况下，我们可以对现有 etcd 数据库进行碎片整理以优化数据库消耗，如此处所述。] Run the following command in all etcd pods. [在所有 etcd pod 中运行以下命令。]

1$ etcdctl defrag

As validation, check the endpoint status of etcd members to know the reduced size of etcd DB. [作为验证，检查 etcd 成员的端点状态以了解 etcd 数据库的减小大小。] Use for this purpose the same diagnostic approaches as listed above. [为此，请使用与上面列出的相同的诊断方法。] More space should be available now. [现在应该有更多空间可用。]

https://github.com/openshift/runbooks
最佳实践