Re: Crashing OSDs (suicide timeout, following a single pool)

Adam Tygart <mozes@xxxxxxx> · Mon, 6 Jun 2016 08:29:21 -0500

Would it be beneficial for anyone to have an archive copy of an osd
that took more than 4 days to export. All but an hour of that time was
spent exporting 1 pg (that ended up being 197MB). I can even send
along the extracted pg for analysis...

--
Adam

On Fri, Jun 3, 2016 at 2:39 PM, Adam Tygart <mozes@xxxxxxx> wrote:
> With regards to this export/import process, I've been exporting a pg
> from an osd for more than 24 hours now. The entire OSD only has 8.6GB
> of data. 3GB of that is in omap. The export for this particular PG is
> only 108MB in size right now, after more than 24 hours. How is it
> possible that a fragmented database on an ssd capable of 13,000 iops
> can be this slow?
>
> --
> Adam
>
> On Fri, Jun 3, 2016 at 11:11 AM, Brandon Morris, PMP
> <brandon.morris.pmp@xxxxxxxxx> wrote:
>> Nice catch.  That was a copy-paste error.  Sorry
>>
>> it should have read:
>>
>>  3. Flush the journal and export the primary version of the PG.  This took 1
>> minute on a well-behaved PG and 4 hours on the misbehaving PG
>>    i.e.   ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-16
>> --journal-path /var/lib/ceph/osd/ceph-16/journal --pgid 32.10c --op export
>> --file /root/32.10c.b.export
>>
>>   4. Import the PG into a New / Temporary OSD that is also offline,
>>    i.e.   ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-100
>> --journal-path /var/lib/ceph/osd/ceph-100/journal --pgid 32.10c --op import
>> --file /root/32.10c.b.export
>>
>>
>> On Thu, Jun 2, 2016 at 5:10 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
>>>
>>> On Thu, Jun 2, 2016 at 9:07 AM, Brandon Morris, PMP
>>> <brandon.morris.pmp@xxxxxxxxx> wrote:
>>>
>>> > The only way that I was able to get back to Health_OK was to
>>> > export/import.  ***** Please note, any time you use the
>>> > ceph_objectstore_tool you risk data loss if not done carefully.   Never
>>> > remove a PG until you have a known good export *****
>>> >
>>> > Here are the steps I used:
>>> >
>>> > 1. set NOOUT, NO BACKFILL
>>> > 2. Stop the OSD's that have the erroring PG
>>> > 3. Flush the journal and export the primary version of the PG.  This
>>> > took 1 minute on a well-behaved PG and 4 hours on the misbehaving PG
>>> >   i.e.   ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-16
>>> > --journal-path /var/lib/ceph/osd/ceph-16/journal --pgid 32.10c --op export
>>> > --file /root/32.10c.b.export
>>> >
>>> > 4. Import the PG into a New / Temporary OSD that is also offline,
>>> >   i.e.   ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-100
>>> > --journal-path /var/lib/ceph/osd/ceph-100/journal --pgid 32.10c --op export
>>> > --file /root/32.10c.b.export
>>>
>>> This should be an import op and presumably to a different data path
>>> and journal path more like the following?
>>>
>>> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-101
>>> --journal-path /var/lib/ceph/osd/ceph-101/journal --pgid 32.10c --op
>>> import --file /root/32.10c.b.export
>>>
>>> Just trying to clarify for anyone that comes across this thread in the
>>> future.
>>>
>>> Cheers,
>>> Brad
>>>
>>> >
>>> > 5. remove the PG from all other OSD's  (16, 143, 214, and 448 in your
>>> > case it looks like)
>>> > 6. Start cluster OSD's
>>> > 7. Start the temporary OSD's and ensure 32.10c backfills correctly to
>>> > the 3 OSD's it is supposed to be on.
>>> >
>>> > This is similar to the recovery process described in this post from
>>> > 04/09/2015:
>>> > http://ceph-users.ceph.narkive.com/lwDkR2fZ/recovering-incomplete-pgs-with-ceph-objectstore-tool
>>> > Hopefully it works in your case too and you can the cluster back to a state
>>> > that you can make the CephFS directories smaller.
>>> >
>>> > - Brandon
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com