On Mon, 4 Feb 2019, Philippe Van Hecke wrote: > So i restarted the osd but he stop after some time. But this is an effect on the cluster and cluster is on a partial recovery process. > > please find here log file of osd 49 after this restart > https://filesender.belnet.be/?s=download&token=8c9c39f2-36f6-43f7-bebb-175679d27a22 It's the same PG 11.182 hitting the same assert when it tries to recover to that OSD. I think the problem will go away once there has been some write traffic, but it may be tricky to prevent it from doing any recovery until then. I just noticed you pasted the wrong 'pg ls' result before: > > result of ceph pg ls | grep 11.118 > > > > 11.118 9788 0 0 0 0 40817837568 1584 1584 active+clean 2019-02-01 12:48:41.343228 70238'19811673 70493:34596887 [121,24] 121 [121,24] 121 69295'19811665 2019-02-01 12:48:41.343144 66131'19810044 2019-01-30 11:44:36.006505 What does 11.182 look like? We can try something slighty different. From before it looked like your only 'incomplete' pg was 11.ac (ceph pg ls incomplete), and the needed state is either on osd.49 or osd.63. On osd.49, do ceph-objectstore-tool --op export on that pg, and then find an otherwise healthy OSD (that doesn't have 11.ac), stop it, and ceph-objectstore-tool --op import it there. When you start it up, 11.ac will hopefull peer and recover. (Or, alternatively, osd.63 may have the needed state.) sage _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com