Etcd Backup

Share on:

install etcd-backup

etcd-backup 用于etcd 集群的补充备份,采用etcd snapshot ,进行全量备份

  • 支持本地存储、分部署存储
  • 支持配置化,备份周期、备份数量等可配
  • 支持自动清除过期的备份数据
  • 存储类型可扩展

create client secret

1oc create secret generic etcd-backup-client-tls --from-file=etcd-client-ca.crt --from-file=etcd-client.crt --from-file=etcd-client.key --namespace=litsky

Set up RBAC

Set up basic [RBAC rules][rbac-rules] for etcd operator:

1sh example/rbac/create_role.sh
1oc adm policy add-scc-to-user privileged  system:serviceaccount:litsky:etcd-operator

create etcd backup

1oc create -f ./example/etcd-backup-operator/statefulset/statefulset.yaml

添加nodeSelector,指定调度节点

1spec:
2  nodeSelector:
3    kubernetes.io/hostname: infra01
4    kubernetes.io/hostname: infra02
5    kubernetes.io/hostname: infra03

create backup crd

默认备份数据存储在宿主机器上,支持分布式云存储(例如:storageType=S3)

1oc create -f ./example/etcd-backup-operator/periodic_backup_cr.yaml

crd 配置相关项

 1spec:
 2  etcdEndpoints: ["192.168.55.124:2379"]   ##etcd 节点
 3  clientTLSSecret: "tumbler-etcd-client-tls"   ## etcd client secret name
 4  storageType: LOCAL   ## 存储类型,local、s3
 5  backupPolicy:
 6    # 0 > enable periodic backup
 7    backupIntervalInSecond: 125   # 备份周期,单位秒
 8    maxBackups: 4       # 保留最大备份数
 9    timeoutInSecond: 600  # 单次备份超时时间,单位秒
10  local:
11    path: /data/ #存储位置

restore etcd data

集群恢复

停掉所有节点,删除etcd_data下的数据,将备份数据拷贝到每个节点,执行数据恢复(以192.168.55.65:2379 为例)

1rm -rf /$ETCD_HOME/etcd_data
2etcdctl snapshot restore ./_v18026710_2019-04-01-07\:47\:35 --data-dir=/$ETCD_HOME/etcd_data --skip-hash-check=true --name=master1 --endpoints="https://192.168.55.65:2379" --initial-advertise-peer-urls="https://192.168.55.65:2380"  --initial-cluster="master1=https://192.168.55.65:2380,master2=https://192.168.55.124:2380,master3=https://192.168.55.55:2380"
3
4#验证etcd数据恢复情况(如下查看k8s节点信息)
5etcdctl get /kubernetes.io/minions --prefix --keys-only

执行完成后,分别在etcd所在的主机执行

1systemctl start etcd

单节点故障恢复(与集群扩容节点通用)

 1# 删除故障节点
 2[13:23:33 root@master1 opt]$ ETCDCTL_API=3 etcdctl --cert="/etc/origin/master/master.etcd-client.crt" --key="/etc/origin/master/master.etcd-client.key" --cacert="/etc/origin/master/master.etcd-ca.crt" --endpoints=192.168.55.124:2379,192.168.55.55:2379,192.168.55.65:2379 member list 
 31063b12b4fd4e7b5, started, master1, https://192.168.55.65:2380, https://192.168.55.65:2379
 443e7c14849bf13ab, started, master3, https://192.168.55.55:2380, https://192.168.55.55:2379
 58eb116cd5f377a0e, started, master2, https://192.168.55.124:2380, https://192.168.55.124:2379
 6[13:24:24 root@master1 opt]$ ETCDCTL_API=3 etcdctl --cert="/etc/origin/master/master.etcd-client.crt" --key="/etc/origin/master/master.etcd-client.key" --cacert="/etc/origin/master/master.etcd-ca.crt" --endpoints=192.168.55.124:2379,192.168.55.55:2379,192.168.55.65:2379 member remove 1063b12b4fd4e7b5
 7Member 1063b12b4fd4e7b5 removed from cluster 5a26d8a168e75ef5
 8# 重新添加故障节点到集群中
 9[13:24:27 root@master1 opt]$ ETCDCTL_API=3 etcdctl --cert="/etc/origin/master/master.etcd-client.crt" --key="/etc/origin/master/master.etcd-client.key" --cacert="/etc/origin/master/master.etcd-ca.crt" --endpoints=192.168.55.124:2379,192.168.55.55:2379,192.168.55.65:2379 member add  master1 --peer-urls=https://192.168.55.65:2380
10Member 60a74fd86702bb93 added to cluster 5a26d8a168e75ef5
11
12ETCD_NAME="master1"
13ETCD_INITIAL_CLUSTER="master3=https://192.168.55.55:2380,master1=https://192.168.55.65:2380,master2=https://192.168.55.124:2380"
14ETCD_INITIAL_CLUSTER_STATE="existing"
15# 删除故障节点数据
16rm -rf $ETCD_DATA/*

将故障节点配置修改为 ETCD_INITIAL_CLUSTER_STATE=existing

1#启动etcd
2systemctl start etcd

验证

 1[13:32:06 root@master1 opt]$ ETCDCTL_API=3 etcdctl --cert="/etc/origin/master/master.etcd-client.crt" --key="/etc/origin/master/master.etcd-client.key" --cacert="/etc/origin/master/master.etcd-ca.crt" --endpoints=192.168.55.124:2379,192.168.55.55:2379,192.168.55.65:2379 member list --write-out=table
 2+------------------+---------+---------+-----------------------------+-----------------------------+
 3|        ID        | STATUS  |  NAME   |         PEER ADDRS          |        CLIENT ADDRS         |
 4+------------------+---------+---------+-----------------------------+-----------------------------+
 5| 43e7c14849bf13ab | started | master3 |  https://192.168.55.55:2380 |  https://192.168.55.55:2379 |
 6| 60a74fd86702bb93 | started | master1 |  https://192.168.55.65:2380 |  https://192.168.55.65:2379 |
 7| 8eb116cd5f377a0e | started | master2 | https://192.168.55.124:2380 | https://192.168.55.124:2379 |
 8+------------------+---------+---------+-----------------------------+-----------------------------+
 9
10[13:32:36 root@master1 opt]$ ETCDCTL_API=3 etcdctl --cert="/etc/origin/master/master.etcd-client.crt" --key="/etc/origin/master/master.etcd-client.key" --cacert="/etc/origin/master/master.etcd-ca.crt" --endpoints=192.168.55.124:2379,192.168.55.55:2379,192.168.55.65:2379 endpoint status --write-out=table
11+---------------------+------------------+---------+---------+-----------+-----------+------------+
12|      ENDPOINT       |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
13+---------------------+------------------+---------+---------+-----------+-----------+------------+
14| 192.168.55.124:2379 | 8eb116cd5f377a0e |  3.2.22 |  334 MB |      true |    121572 |   82894906 |
15|  192.168.55.55:2379 | 43e7c14849bf13ab |  3.2.22 |  334 MB |     false |    121572 |   82894906 |
16|  192.168.55.65:2379 | 60a74fd86702bb93 |  3.2.22 |  334 MB |     false |    121572 |   82894906 |
17+---------------------+------------------+---------+---------+-----------+-----------+------------+
18

Cleanup etcd-backup

1oc delete -f ./example/etcd-backup-operator/periodic_backup_cr.yaml
2oc delete -f ./example/etcd-backup-operator/statefulset/statefulset.yaml
3oc delete -f ./example/etcd-backup-operator/statefulset/secret.yaml
4oc delete crd etcdbackups.etcd.database.coreos.com