Hello,
We run nautilus 14.2.8 ceph cluster.
After a big crash in which we lost some disks we had a PG down (Erasure
coding 3+2 pool) and trying to fix it we followed this
https://medium.com/opsops/recovering-ceph-from-reduced-data-availability-3-pgs-inactive-3-pgs-incomplete-b97cbcb4b5a1
As the PG was reported with 0 objects we first marked a shard as
complete with ceph-objectstore-tool and restart the osd.
The pg thus went active but reported lost objects !
As we consider the datas on this pg as lost, we try to get rid of this
with ceph pg 30.3 mark_unfound_lost delete.
This produced some logs like (~3 lines/hour):
2020-05-12 14:45:05.251830 osd.103 (osd.103) 886 : cluster [ERR] 30.3s0
Unexpected Error: recovery ending with 41:
{30:c000e27d:::rbd_data.34.c963b6314efb84.000000000
0000100:head=435293'2 flags =
delete,30:c01f1248:::rbd_data.34.7f0c0d1df22f45.0000000000000325:head=435293'3
flags = delete,30:c05e82b2:::rbd_data.34.674d063bdc66d2.0
000000000000015:head=435293'4 flags =
delete,30:c0b2d8e7:::rbd_data.34.6bc88749c741cb.00000000000007d0:head=435293'5
flags = delete,30:c0c3e20e:::rbd_data.34.674d063b
dc66d2.00000000000000fb:head=435293'6 flags =
delete,30:c0c89740:::rbd_data.34.a7f2202210bb39.0000000000000bbc:head=435293'7
flags = delete,30:c0e59ffa:::rbd_data.34.
7f0c0d1df22f45.00000000000002fb:head=435293'8 flags =
delete,30:c0e72bf4:::rbd_data.34.7f0c0d1df22f45.00000000000000fa:head=435293'9
flags = delete,30:c10ab507:::rbd_
data.34.80695c646d9535.0000000000000327:head=435293'10 flags =
delete,30:c219e412:::rbd_data.34.a7f2202210bb39.0000000000000fa0:head=435293'11
flags = delete,30:c29ae
ba3:::rbd_data.34.8038585a0eb9f6.0000000000000eb2:head=435293'12 flags =
delete,30:c29fae09:::rbd_data.34.674d063bdc66d2.000000000000148a:head=435293'13
flags = delet
e,30:c2b77a99:::rbd_data.34.7f0c0d1df22f45.000000000000031d:head=435293'14
flags =
delete,30:c2c8598f:::rbd_data.34.674d063bdc66d2.00000000000002f5:head=435293'15
fla
gs =
delete,30:c2dd39fe:::rbd_data.34.6494fb1b0f88bf.000000000000030b:head=435293'16
flags =
delete,30:c2f6ce39:::rbd_data.34.806ab864459ae5.0000000000000109:head=435
293'17 flags =
delete,30:c2f8a62f:::rbd_data.34.ed0c58ebdc770f.000000000000002a:head=435293'18
flags = delete,30:c306cd86:::rbd_data.34.ed0c58ebdc770f.000000000000020
5:head=435293'19 flags =
delete,30:c30f5230:::rbd_data.34.7f0c0d1df22f45.00000000000002f5:head=435293'20
flags = delete,30:c32b81df:::rbd_data.34.c79f6d1f78a707.00000
00000000100:head=435293'21 flags =
delete,30:c3374080:::rbd_data.34.7f217e33dd742c.00000000000007d0:head=435293'22
flags = delete,30:c3cdbeb5:::rbd_data.34.674dcefe97
f606.0000000000000109:head=435293'23 flags =
delete,30:c3cdd149:::rbd_data.34.674dcefe97f606.0000000000000019:head=435293'24
flags = delete,30:c40946c0:::rbd_data.34.
ded8d21a9d3d8f.00000000000002a8:head=435293'25 flags =
delete,30:c42ed4fd:::rbd_data.34.a6985314ad8dad.0000000000000200:head=435293'26
flags = delete,30:c483a99b:::rb
d_data.34.ed0c58ebdc770f.0000000000000a00:head=435293'27 flags =
delete,30:c49f09d6:::rbd_data.34.7e1c1abf436885.0000000000000bb8:head=435293'28
flags = delete,30:c51
5a4e8:::rbd_data.34.ed0c58ebdc770f.0000000000000106:head=435293'29 flags
=
delete,30:c5181a8e:::rbd_data.34.9385d45172fa0f.000000000000020c:head=435293'30
flags = del
ete,30:c531de44:::rbd_data.34.6bc88749c741cb.0000000000000102:head=435293'31
flags =
delete,30:c5427518:::rbd_data.34.806ab864459ae5.00000000000006db:head=435293'32
f
lags =
delete,30:c5693b53:::rbd_data.34.6494fb1b0f88bf.000000000000148a:head=435293'33
flags =
delete,30:c5804bc9:::rbd_data.34.ed0cb8730e020c.0000000000000105:head=4
35293'34 flags =
delete,30:c598117e:::rbd_data.34.7f0811fbac0b9d.0000000000000327:head=435293'35
flags = delete,30:c5a64fbd:::rbd_data.34.c963b6314efb84.0000000000000
010:head=435293'36 flags =
delete,30:c5f9e0e5:::rbd_data.34.ed0c58ebdc770f.0000000000000f01:head=435293'37
flags = delete,30:c5ffe1d8:::rbd_data.34.6bc88749c741cb.000
0000000000abe:head=435293'38 flags =
delete,30:c6ecfaa1:::rbd_data.34.9385d45172fa0f.0000000000000002:head=435293'39
flags = delete,30:c755550f:::rbd_data.34.6494fb1b
0f88bf.0000000000000106:head=435293'40 flags =
delete,30:c7a730f4:::rbd_data.34.7f217e33dd742c.00000000000006e1:head=435293'41
flags = delete,30:c7aa79f7:::rbd_data.3
4.674dcefe97f606.0000000000000108:head=435293'42 flags = delete}
But yesterday it started to flood the logs (~9 GB of logs/day !) with
lines like :
2020-05-14 10:36:03.851258 osd.29 [ERR] Error -2 reading object
30:c24a0173:::rbd_data.34.806ab864459ae5.000000000000022d:head
2020-05-14 10:36:03.851333 osd.29 [ERR] Error -2 reading object
30:c4a41972:::rbd_data.34.6bc88749c741cb.0000000000000320:head
2020-05-14 10:36:03.851382 osd.29 [ERR] Error -2 reading object
30:c543da6f:::rbd_data.34.80695c646d9535.0000000000000dce:head
2020-05-14 10:36:03.859900 osd.29 [ERR] Error -2 reading object
30:c24a0173:::rbd_data.34.806ab864459ae5.000000000000022d:head
2020-05-14 10:36:03.859979 osd.29 [ERR] Error -2 reading object
30:c4a41972:::rbd_data.34.6bc88749c741cb.0000000000000320:head
We think that the best would probably to completely delete this pg. Is
that possible without totally breaking the pool ? How ?
Do we need to recreate the pg manually (or ceph will do it automatically) ?
Thanks for you help.
F.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx