So I've totally disable cache-tiering and overlay. Now OSD 68 & 69 are fine, no more blocked. But OSD 32 is still blocked, and PG 37.9c still marked incomplete with : "recovery_state": [ { "name": "Started/Primary/Peering/Incomplete", "enter_time": "2018-09-21 18:56:01.222970", "comment": "not enough complete instances of this PG" }, But I don't see blocked requests in OSD.32 logs, should I increase one of the "debug_xx" flag ? Le vendredi 21 septembre 2018 à 16:51 +0200, Maks Kowalik a écrit : > According to the query output you pasted shards 1 and 2 are broken. > But, on the other hand EC profile (4+2) should make it possible to > recover from 2 shards lost simultanously... > > pt., 21 wrz 2018 o 16:29 Olivier Bonvalet <ceph.list@xxxxxxxxx> > napisał(a): > > Well on drive, I can find thoses parts : > > > > - cs0 on OSD 29 and 30 > > - cs1 on OSD 18 and 19 > > - cs2 on OSD 13 > > - cs3 on OSD 66 > > - cs4 on OSD 0 > > - cs5 on OSD 75 > > > > And I can read thoses files too. > > > > And all thoses OSD are UP and IN. > > > > > > Le vendredi 21 septembre 2018 à 13:10 +0000, Eugen Block a écrit : > > > > > I tried to flush the cache with "rados -p cache-bkp-foo > > cache- > > > > > flush- > > > > > evict-all", but it blocks on the object > > > > > "rbd_data.f66c92ae8944a.00000000000f2596". > > > > > > This is the object that's stuck in the cache tier (according to > > > your > > > output in https://pastebin.com/zrwu5X0w). Can you verify if that > > > block > > > device is in use and healthy or is it corrupt? > > > > > > > > > Zitat von Maks Kowalik <maks_kowalik@xxxxxxxxx>: > > > > > > > Could you, please paste the output of pg 37.9c query > > > > > > > > pt., 21 wrz 2018 o 14:39 Olivier Bonvalet <ceph.list@xxxxxxxxx> > > > > napisał(a): > > > > > > > > > In fact, one object (only one) seem to be blocked on the > > cache > > > > > tier > > > > > (writeback). > > > > > > > > > > I tried to flush the cache with "rados -p cache-bkp-foo > > cache- > > > > > flush- > > > > > evict-all", but it blocks on the object > > > > > "rbd_data.f66c92ae8944a.00000000000f2596". > > > > > > > > > > So I reduced (a lot) the cache tier to 200MB, "rados -p > > cache- > > > > > bkp-foo > > > > > ls" now show only 3 objects : > > > > > > > > > > rbd_directory > > > > > rbd_data.f66c92ae8944a.00000000000f2596 > > > > > rbd_header.f66c92ae8944a > > > > > > > > > > And "cache-flush-evict-all" still hangs. > > > > > > > > > > I also switched the cache tier to "readproxy", to avoid using > > > > > this > > > > > cache. But, it's still blocked. > > > > > > > > > > > > > > > > > > > > > > > > > Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet > > a > > > > > écrit : > > > > > > Hello, > > > > > > > > > > > > on a Luminous cluster, I have a PG incomplete and I can't > > find > > > > > > how to > > > > > > fix that. > > > > > > > > > > > > It's an EC pool (4+2) : > > > > > > > > > > > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] > > (reducing > > > > > > pool > > > > > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs > > for > > > > > > 'incomplete') > > > > > > > > > > > > Of course, we can't reduce min_size from 4. > > > > > > > > > > > > And the full state : https://pastebin.com/zrwu5X0w > > > > > > > > > > > > So, IO are blocked, we can't access thoses damaged data. > > > > > > OSD blocks too : > > > > > > osds 32,68,69 have stuck requests > 4194.3 sec > > > > > > > > > > > > OSD 32 is the primary of this PG. > > > > > > And OSD 68 and 69 are for cache tiering. > > > > > > > > > > > > Any idea how can I fix that ? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Olivier > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > ceph-users mailing list > > > > > > ceph-users@xxxxxxxxxxxxxx > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > > > > > _______________________________________________ > > > > > ceph-users mailing list > > > > > ceph-users@xxxxxxxxxxxxxx > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > > > > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@xxxxxxxxxxxxxx > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com