Re: PG is stuck in repmapped and degraded

Paul Emmerich <paul.emmerich@xxxxxxxx> · Mon, 30 Sep 2019 11:15:17 +0200

Looks like the journal SSD is broken. If it's still readable but not
writable, then you can run

    ceph-osd --id ... --flush-journal

and replace the disk after doing so.

You can then just point the sym links in
/var/lib/ceph/osd/ceph-*/journal to the new journal and run

    ceph-osd --id ... --mkjournal

If the journal is no longer readable: the safe variant is to
completely re-create the OSDs after replacing the journal disk. (The
unsafe way to go is to just skip the --flush-journal part, not
recommended)

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Sep 30, 2019 at 3:51 AM 展荣臻（信泰） <zhanrzh_xt@xxxxxxxxxxxxxx> wrote:
>
>
>
>
>
> > > Hi,all
> > >  we use openstack + ceph(hammer) in my production
> >
> > Hammer is soooooo 2015.
> >
> > > There are 22 osds on a host and 11 osds share one ssd for osd journal.
> >
> > I can’t imagine a scenario in which this strategy makes sense, the documentation and books are quite clear on why this is a bad idea.  Assuming that your OSDs are HDD and the journal devices are SATA SSD, the journals are going to be a bottleneck, and you’re going to wear through them quickly.  If you have a read-mostly workload, colocating them would be safer.
>
>    Oh, i am wrong,we use sas ssd.
>
> > I also suspect that something is amiss with your CRUSH topology that is preventing recovery, and/or you actually have multiple overlapping failures.
> >
>
> My crushmap is at https://github.com/rongzhen-zhan/myfile/blob/master/crushmap
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx