Looks like the journal SSD is broken. If it's still readable but not writable, then you can run ceph-osd --id ... --flush-journal and replace the disk after doing so. You can then just point the sym links in /var/lib/ceph/osd/ceph-*/journal to the new journal and run ceph-osd --id ... --mkjournal If the journal is no longer readable: the safe variant is to completely re-create the OSDs after replacing the journal disk. (The unsafe way to go is to just skip the --flush-journal part, not recommended) Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Mon, Sep 30, 2019 at 3:51 AM 展荣臻(信泰) <zhanrzh_xt@xxxxxxxxxxxxxx> wrote: > > > > > > > > Hi,all > > > we use openstack + ceph(hammer) in my production > > > > Hammer is soooooo 2015. > > > > > There are 22 osds on a host and 11 osds share one ssd for osd journal. > > > > I can’t imagine a scenario in which this strategy makes sense, the documentation and books are quite clear on why this is a bad idea. Assuming that your OSDs are HDD and the journal devices are SATA SSD, the journals are going to be a bottleneck, and you’re going to wear through them quickly. If you have a read-mostly workload, colocating them would be safer. > > Oh, i am wrong,we use sas ssd. > > > I also suspect that something is amiss with your CRUSH topology that is preventing recovery, and/or you actually have multiple overlapping failures. > > > > My crushmap is at https://github.com/rongzhen-zhan/myfile/blob/master/crushmap > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx