Etcd Backup
install etcd-backup
etcd-backup 用于etcd 集群的补充备份,采用etcd snapshot ,进行全量备份
- 支持本地存储、分部署存储
- 支持配置化,备份周期、备份数量等可配
- 支持自动清除过期的备份数据
- 存储类型可扩展
create client secret
1oc create secret generic etcd-backup-client-tls --from-file=etcd-client-ca.crt --from-file=etcd-client.crt --from-file=etcd-client.key --namespace=litsky
Set up RBAC
Set up basic [RBAC rules][rbac-rules] for etcd operator:
1sh example/rbac/create_role.sh
1oc adm policy add-scc-to-user privileged system:serviceaccount:litsky:etcd-operator
create etcd backup
1oc create -f ./example/etcd-backup-operator/statefulset/statefulset.yaml
添加nodeSelector,指定调度节点
1spec:
2 nodeSelector:
3 kubernetes.io/hostname: infra01
4 kubernetes.io/hostname: infra02
5 kubernetes.io/hostname: infra03
create backup crd
默认备份数据存储在宿主机器上,支持分布式云存储(例如:storageType=S3)
1oc create -f ./example/etcd-backup-operator/periodic_backup_cr.yaml
crd 配置相关项
1spec:
2 etcdEndpoints: ["192.168.55.124:2379"] ##etcd 节点
3 clientTLSSecret: "tumbler-etcd-client-tls" ## etcd client secret name
4 storageType: LOCAL ## 存储类型,local、s3
5 backupPolicy:
6 # 0 > enable periodic backup
7 backupIntervalInSecond: 125 # 备份周期,单位秒
8 maxBackups: 4 # 保留最大备份数
9 timeoutInSecond: 600 # 单次备份超时时间,单位秒
10 local:
11 path: /data/ #存储位置
restore etcd data
集群恢复
停掉所有节点,删除etcd_data下的数据,将备份数据拷贝到每个节点,执行数据恢复(以192.168.55.65:2379 为例)
1rm -rf /$ETCD_HOME/etcd_data
2etcdctl snapshot restore ./_v18026710_2019-04-01-07\:47\:35 --data-dir=/$ETCD_HOME/etcd_data --skip-hash-check=true --name=master1 --endpoints="https://192.168.55.65:2379" --initial-advertise-peer-urls="https://192.168.55.65:2380" --initial-cluster="master1=https://192.168.55.65:2380,master2=https://192.168.55.124:2380,master3=https://192.168.55.55:2380"
3
4#验证etcd数据恢复情况(如下查看k8s节点信息)
5etcdctl get /kubernetes.io/minions --prefix --keys-only
执行完成后,分别在etcd所在的主机执行
1systemctl start etcd
单节点故障恢复(与集群扩容节点通用)
1# 删除故障节点
2[13:23:33 root@master1 opt]$ ETCDCTL_API=3 etcdctl --cert="/etc/origin/master/master.etcd-client.crt" --key="/etc/origin/master/master.etcd-client.key" --cacert="/etc/origin/master/master.etcd-ca.crt" --endpoints=192.168.55.124:2379,192.168.55.55:2379,192.168.55.65:2379 member list
31063b12b4fd4e7b5, started, master1, https://192.168.55.65:2380, https://192.168.55.65:2379
443e7c14849bf13ab, started, master3, https://192.168.55.55:2380, https://192.168.55.55:2379
58eb116cd5f377a0e, started, master2, https://192.168.55.124:2380, https://192.168.55.124:2379
6[13:24:24 root@master1 opt]$ ETCDCTL_API=3 etcdctl --cert="/etc/origin/master/master.etcd-client.crt" --key="/etc/origin/master/master.etcd-client.key" --cacert="/etc/origin/master/master.etcd-ca.crt" --endpoints=192.168.55.124:2379,192.168.55.55:2379,192.168.55.65:2379 member remove 1063b12b4fd4e7b5
7Member 1063b12b4fd4e7b5 removed from cluster 5a26d8a168e75ef5
8# 重新添加故障节点到集群中
9[13:24:27 root@master1 opt]$ ETCDCTL_API=3 etcdctl --cert="/etc/origin/master/master.etcd-client.crt" --key="/etc/origin/master/master.etcd-client.key" --cacert="/etc/origin/master/master.etcd-ca.crt" --endpoints=192.168.55.124:2379,192.168.55.55:2379,192.168.55.65:2379 member add master1 --peer-urls=https://192.168.55.65:2380
10Member 60a74fd86702bb93 added to cluster 5a26d8a168e75ef5
11
12ETCD_NAME="master1"
13ETCD_INITIAL_CLUSTER="master3=https://192.168.55.55:2380,master1=https://192.168.55.65:2380,master2=https://192.168.55.124:2380"
14ETCD_INITIAL_CLUSTER_STATE="existing"
15# 删除故障节点数据
16rm -rf $ETCD_DATA/*
将故障节点配置修改为 ETCD_INITIAL_CLUSTER_STATE=existing
1#启动etcd
2systemctl start etcd
验证
1[13:32:06 root@master1 opt]$ ETCDCTL_API=3 etcdctl --cert="/etc/origin/master/master.etcd-client.crt" --key="/etc/origin/master/master.etcd-client.key" --cacert="/etc/origin/master/master.etcd-ca.crt" --endpoints=192.168.55.124:2379,192.168.55.55:2379,192.168.55.65:2379 member list --write-out=table
2+------------------+---------+---------+-----------------------------+-----------------------------+
3| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
4+------------------+---------+---------+-----------------------------+-----------------------------+
5| 43e7c14849bf13ab | started | master3 | https://192.168.55.55:2380 | https://192.168.55.55:2379 |
6| 60a74fd86702bb93 | started | master1 | https://192.168.55.65:2380 | https://192.168.55.65:2379 |
7| 8eb116cd5f377a0e | started | master2 | https://192.168.55.124:2380 | https://192.168.55.124:2379 |
8+------------------+---------+---------+-----------------------------+-----------------------------+
9
10[13:32:36 root@master1 opt]$ ETCDCTL_API=3 etcdctl --cert="/etc/origin/master/master.etcd-client.crt" --key="/etc/origin/master/master.etcd-client.key" --cacert="/etc/origin/master/master.etcd-ca.crt" --endpoints=192.168.55.124:2379,192.168.55.55:2379,192.168.55.65:2379 endpoint status --write-out=table
11+---------------------+------------------+---------+---------+-----------+-----------+------------+
12| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
13+---------------------+------------------+---------+---------+-----------+-----------+------------+
14| 192.168.55.124:2379 | 8eb116cd5f377a0e | 3.2.22 | 334 MB | true | 121572 | 82894906 |
15| 192.168.55.55:2379 | 43e7c14849bf13ab | 3.2.22 | 334 MB | false | 121572 | 82894906 |
16| 192.168.55.65:2379 | 60a74fd86702bb93 | 3.2.22 | 334 MB | false | 121572 | 82894906 |
17+---------------------+------------------+---------+---------+-----------+-----------+------------+
18
Cleanup etcd-backup
1oc delete -f ./example/etcd-backup-operator/periodic_backup_cr.yaml
2oc delete -f ./example/etcd-backup-operator/statefulset/statefulset.yaml
3oc delete -f ./example/etcd-backup-operator/statefulset/secret.yaml
4oc delete crd etcdbackups.etcd.database.coreos.com