Re: Crashing OSDs (suicide timeout, following a single pool)

Adam Tygart <mozes@xxxxxxx> · Fri, 3 Jun 2016 14:39:58 -0500

With regards to this export/import process, I've been exporting a pg
from an osd for more than 24 hours now. The entire OSD only has 8.6GB
of data. 3GB of that is in omap. The export for this particular PG is
only 108MB in size right now, after more than 24 hours. How is it
possible that a fragmented database on an ssd capable of 13,000 iops
can be this slow?

--
Adam

On Fri, Jun 3, 2016 at 11:11 AM, Brandon Morris, PMP
<brandon.morris.pmp@xxxxxxxxx> wrote:
> Nice catch.  That was a copy-paste error.  Sorry
>
> it should have read:
>
>  3. Flush the journal and export the primary version of the PG.  This took 1
> minute on a well-behaved PG and 4 hours on the misbehaving PG
>    i.e.   ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-16
> --journal-path /var/lib/ceph/osd/ceph-16/journal --pgid 32.10c --op export
> --file /root/32.10c.b.export
>
>   4. Import the PG into a New / Temporary OSD that is also offline,
>    i.e.   ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-100
> --journal-path /var/lib/ceph/osd/ceph-100/journal --pgid 32.10c --op import
> --file /root/32.10c.b.export
>
>
> On Thu, Jun 2, 2016 at 5:10 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
>>
>> On Thu, Jun 2, 2016 at 9:07 AM, Brandon Morris, PMP
>> <brandon.morris.pmp@xxxxxxxxx> wrote:
>>
>> > The only way that I was able to get back to Health_OK was to
>> > export/import.  ***** Please note, any time you use the
>> > ceph_objectstore_tool you risk data loss if not done carefully.   Never
>> > remove a PG until you have a known good export *****
>> >
>> > Here are the steps I used:
>> >
>> > 1. set NOOUT, NO BACKFILL
>> > 2. Stop the OSD's that have the erroring PG
>> > 3. Flush the journal and export the primary version of the PG.  This
>> > took 1 minute on a well-behaved PG and 4 hours on the misbehaving PG
>> >   i.e.   ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-16
>> > --journal-path /var/lib/ceph/osd/ceph-16/journal --pgid 32.10c --op export
>> > --file /root/32.10c.b.export
>> >
>> > 4. Import the PG into a New / Temporary OSD that is also offline,
>> >   i.e.   ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-100
>> > --journal-path /var/lib/ceph/osd/ceph-100/journal --pgid 32.10c --op export
>> > --file /root/32.10c.b.export
>>
>> This should be an import op and presumably to a different data path
>> and journal path more like the following?
>>
>> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-101
>> --journal-path /var/lib/ceph/osd/ceph-101/journal --pgid 32.10c --op
>> import --file /root/32.10c.b.export
>>
>> Just trying to clarify for anyone that comes across this thread in the
>> future.
>>
>> Cheers,
>> Brad
>>
>> >
>> > 5. remove the PG from all other OSD's  (16, 143, 214, and 448 in your
>> > case it looks like)
>> > 6. Start cluster OSD's
>> > 7. Start the temporary OSD's and ensure 32.10c backfills correctly to
>> > the 3 OSD's it is supposed to be on.
>> >
>> > This is similar to the recovery process described in this post from
>> > 04/09/2015:
>> > http://ceph-users.ceph.narkive.com/lwDkR2fZ/recovering-incomplete-pgs-with-ceph-objectstore-tool
>> > Hopefully it works in your case too and you can the cluster back to a state
>> > that you can make the CephFS directories smaller.
>> >
>> > - Brandon
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com