Re: Luminous cluster in very bad state need some assistance.

Philippe Van Hecke <Philippe.VanHecke@xxxxxxxxx> · Mon, 4 Feb 2019 06:35:03 +0000

result of  ceph pg ls | grep 11.118

11.118     9788                  0        0         0       0 40817837568 1584     1584                             active+clean 2019-02-01 12:48:41.343228  70238'19811673  70493:34596887  [121,24]        121  [121,24]            121  69295'19811665 2019-02-01 12:48:41.343144  66131'19810044 2019-01-30 11:44:36.006505

cp done.

So i can make  ceph-objecstore-tool --op remove command ?

________________________________________
From: Sage Weil <sage@xxxxxxxxxxxx>
Sent: 04 February 2019 07:26
To: Philippe Van Hecke
Cc: ceph-users@xxxxxxxxxxxxxx; Belnet Services
Subject: Re:  Luminous cluster in very bad state need some assistance.

On Mon, 4 Feb 2019, Philippe Van Hecke wrote:
> Hi Sage,
>
> I try to make the following.
>
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-49/ --journal /var/lib/ceph/osd/ceph-49/journal --pgid 11.182 --op export-remove --debug --file /tmp/export-pg/18.182 2>ceph-objectstore-tool-export-remove.txt
> but this rise exception
>
> find here  https://filesender.belnet.be/?s=download&token=e2b1fdbc-0739-423f-9d97-0bd258843a33 file ceph-objectstore-tool-export-remove.txt

In that case,  cp --preserve=all
/var/lib/ceph/osd/ceph-49/current/11.182_head to a safe location and then
use the ceph-objecstore-tool --op remove command.  But first confirm that
'ceph pg ls' shows the PG as active.

sage

 > > Kr
>
> Philippe.
>
> ________________________________________
> From: Sage Weil <sage@xxxxxxxxxxxx>
> Sent: 04 February 2019 06:59
> To: Philippe Van Hecke
> Cc: ceph-users@xxxxxxxxxxxxxx; Belnet Services
> Subject: Re:  Luminous cluster in very bad state need some assistance.
>
> On Mon, 4 Feb 2019, Philippe Van Hecke wrote:
> > Hi Sage, First of all tanks for your help
> >
> > Please find here  https://filesender.belnet.be/?s=download&token=dea0edda-5b6a-4284-9ea1-c1fdf88b65e9
> > the osd log with debug info for osd.49. and indeed if all buggy osd can restart that can may be solve the issue.
> > But i also happy that you confirm my understanding that in the worst case removing pool can also resolve the problem even in this case i lose data  but finish with a working cluster.
>
> If PGs are damaged, removing the pool would be part of getting to
> HEALTH_OK, but you'd probably also need to remove any problematic PGs that
> are preventing the OSD starting.
>
> But keep in mind that (1) i see 3 PGs that don't peer spread across pools
> 11 and 12; not sure which one you are considering deleting.  Also (2) if
> one pool isn't fully available it generall won't be a problem for other
> pools, as long as the osds start.  And doing ceph-objectstore-tool
> export-remove is a pretty safe way to move any problem PGs out of the way
> to get your OSDs starting--just make sure you hold onto that backup/export
> because you may need it later!
>
> > PS: don't know and don't want to open debat about top/bottom posting but would like to know the preference of this list :-)
>
> No preference :)
>
> sage
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com