Remove or recreate damaged PG in erasure coding pool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,
We run nautilus 14.2.8 ceph cluster.
After a big crash in which we lost some disks we had a PG down (Erasure coding 3+2 pool) and trying to fix it we followed this https://medium.com/opsops/recovering-ceph-from-reduced-data-availability-3-pgs-inactive-3-pgs-incomplete-b97cbcb4b5a1 As the PG was reported with 0 objects we first marked a shard as complete with ceph-objectstore-tool and restart the osd.
The pg thus went active but reported lost objects !
As we consider the datas on this pg as lost, we try to get rid of this with ceph pg 30.3 mark_unfound_lost delete.

This produced some logs like (~3 lines/hour):

2020-05-12 14:45:05.251830 osd.103 (osd.103) 886 : cluster [ERR] 30.3s0 Unexpected Error: recovery ending with 41: {30:c000e27d:::rbd_data.34.c963b6314efb84.000000000 0000100:head=435293'2 flags = delete,30:c01f1248:::rbd_data.34.7f0c0d1df22f45.0000000000000325:head=435293'3 flags = delete,30:c05e82b2:::rbd_data.34.674d063bdc66d2.0 000000000000015:head=435293'4 flags = delete,30:c0b2d8e7:::rbd_data.34.6bc88749c741cb.00000000000007d0:head=435293'5 flags = delete,30:c0c3e20e:::rbd_data.34.674d063b dc66d2.00000000000000fb:head=435293'6 flags = delete,30:c0c89740:::rbd_data.34.a7f2202210bb39.0000000000000bbc:head=435293'7 flags = delete,30:c0e59ffa:::rbd_data.34. 7f0c0d1df22f45.00000000000002fb:head=435293'8 flags = delete,30:c0e72bf4:::rbd_data.34.7f0c0d1df22f45.00000000000000fa:head=435293'9 flags = delete,30:c10ab507:::rbd_ data.34.80695c646d9535.0000000000000327:head=435293'10 flags = delete,30:c219e412:::rbd_data.34.a7f2202210bb39.0000000000000fa0:head=435293'11 flags = delete,30:c29ae ba3:::rbd_data.34.8038585a0eb9f6.0000000000000eb2:head=435293'12 flags = delete,30:c29fae09:::rbd_data.34.674d063bdc66d2.000000000000148a:head=435293'13 flags = delet e,30:c2b77a99:::rbd_data.34.7f0c0d1df22f45.000000000000031d:head=435293'14 flags = delete,30:c2c8598f:::rbd_data.34.674d063bdc66d2.00000000000002f5:head=435293'15 fla gs = delete,30:c2dd39fe:::rbd_data.34.6494fb1b0f88bf.000000000000030b:head=435293'16 flags = delete,30:c2f6ce39:::rbd_data.34.806ab864459ae5.0000000000000109:head=435 293'17 flags = delete,30:c2f8a62f:::rbd_data.34.ed0c58ebdc770f.000000000000002a:head=435293'18 flags = delete,30:c306cd86:::rbd_data.34.ed0c58ebdc770f.000000000000020 5:head=435293'19 flags = delete,30:c30f5230:::rbd_data.34.7f0c0d1df22f45.00000000000002f5:head=435293'20 flags = delete,30:c32b81df:::rbd_data.34.c79f6d1f78a707.00000 00000000100:head=435293'21 flags = delete,30:c3374080:::rbd_data.34.7f217e33dd742c.00000000000007d0:head=435293'22 flags = delete,30:c3cdbeb5:::rbd_data.34.674dcefe97 f606.0000000000000109:head=435293'23 flags = delete,30:c3cdd149:::rbd_data.34.674dcefe97f606.0000000000000019:head=435293'24 flags = delete,30:c40946c0:::rbd_data.34. ded8d21a9d3d8f.00000000000002a8:head=435293'25 flags = delete,30:c42ed4fd:::rbd_data.34.a6985314ad8dad.0000000000000200:head=435293'26 flags = delete,30:c483a99b:::rb d_data.34.ed0c58ebdc770f.0000000000000a00:head=435293'27 flags = delete,30:c49f09d6:::rbd_data.34.7e1c1abf436885.0000000000000bb8:head=435293'28 flags = delete,30:c51 5a4e8:::rbd_data.34.ed0c58ebdc770f.0000000000000106:head=435293'29 flags = delete,30:c5181a8e:::rbd_data.34.9385d45172fa0f.000000000000020c:head=435293'30 flags = del ete,30:c531de44:::rbd_data.34.6bc88749c741cb.0000000000000102:head=435293'31 flags = delete,30:c5427518:::rbd_data.34.806ab864459ae5.00000000000006db:head=435293'32 f lags = delete,30:c5693b53:::rbd_data.34.6494fb1b0f88bf.000000000000148a:head=435293'33 flags = delete,30:c5804bc9:::rbd_data.34.ed0cb8730e020c.0000000000000105:head=4 35293'34 flags = delete,30:c598117e:::rbd_data.34.7f0811fbac0b9d.0000000000000327:head=435293'35 flags = delete,30:c5a64fbd:::rbd_data.34.c963b6314efb84.0000000000000 010:head=435293'36 flags = delete,30:c5f9e0e5:::rbd_data.34.ed0c58ebdc770f.0000000000000f01:head=435293'37 flags = delete,30:c5ffe1d8:::rbd_data.34.6bc88749c741cb.000 0000000000abe:head=435293'38 flags = delete,30:c6ecfaa1:::rbd_data.34.9385d45172fa0f.0000000000000002:head=435293'39 flags = delete,30:c755550f:::rbd_data.34.6494fb1b 0f88bf.0000000000000106:head=435293'40 flags = delete,30:c7a730f4:::rbd_data.34.7f217e33dd742c.00000000000006e1:head=435293'41 flags = delete,30:c7aa79f7:::rbd_data.3
4.674dcefe97f606.0000000000000108:head=435293'42 flags = delete}

But yesterday it started to flood the logs (~9 GB of logs/day !) with lines like :

2020-05-14 10:36:03.851258 osd.29 [ERR] Error -2 reading object 30:c24a0173:::rbd_data.34.806ab864459ae5.000000000000022d:head 2020-05-14 10:36:03.851333 osd.29 [ERR] Error -2 reading object 30:c4a41972:::rbd_data.34.6bc88749c741cb.0000000000000320:head 2020-05-14 10:36:03.851382 osd.29 [ERR] Error -2 reading object 30:c543da6f:::rbd_data.34.80695c646d9535.0000000000000dce:head 2020-05-14 10:36:03.859900 osd.29 [ERR] Error -2 reading object 30:c24a0173:::rbd_data.34.806ab864459ae5.000000000000022d:head 2020-05-14 10:36:03.859979 osd.29 [ERR] Error -2 reading object 30:c4a41972:::rbd_data.34.6bc88749c741cb.0000000000000320:head

We think that the best would probably to completely delete this pg. Is that possible without totally breaking the pool ? How ?
Do we need to recreate the pg manually (or ceph will do it automatically) ?
Thanks for you help.

F.


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux