Re: Crashing OSDs (suicide timeout, following a single pool)

Adam Tygart <mozes@xxxxxxx> · Fri, 3 Jun 2016 10:57:01 -0500

Is there any way we could have a "leveldb_defrag_on_mount" option for
the osds similar to the "leveldb_compact_on_mount" option?

Also, I've got at least one user that is creating and deleting
thousands of files at a time in some of their directories (keeping
1-2% of them). Could that cause this fragmentation that we think is
the issue?
--
Adam

On Thu, Jun 2, 2016 at 10:32 PM, Adam Tygart <mozes@xxxxxxx> wrote:
> I'm still exporting pgs out of some of the downed osds, but things are
> definitely looking promising.
>
> Marginally related to this thread, as these seem to be most of the
> hanging objects when exporting pgs, what are inodes in the 600 range
> used for within the metadata pool? I know the 200 range is used for
> journaling. 8 of the 13 osds I've got left down are currently trying
> to export objects in the 600 range. Are these just MDS journal objects
> from an mds severely behind on trimming?
>
> --
> Adam
>
> On Thu, Jun 2, 2016 at 6:10 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
>> On Thu, Jun 2, 2016 at 9:07 AM, Brandon Morris, PMP
>> <brandon.morris.pmp@xxxxxxxxx> wrote:
>>
>>> The only way that I was able to get back to Health_OK was to export/import.  ***** Please note, any time you use the ceph_objectstore_tool you risk data loss if not done carefully.   Never remove a PG until you have a known good export *****
>>>
>>> Here are the steps I used:
>>>
>>> 1. set NOOUT, NO BACKFILL
>>> 2. Stop the OSD's that have the erroring PG
>>> 3. Flush the journal and export the primary version of the PG.  This took 1 minute on a well-behaved PG and 4 hours on the misbehaving PG
>>>   i.e.   ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-16 --journal-path /var/lib/ceph/osd/ceph-16/journal --pgid 32.10c --op export --file /root/32.10c.b.export
>>>
>>> 4. Import the PG into a New / Temporary OSD that is also offline,
>>>   i.e.   ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-100 --journal-path /var/lib/ceph/osd/ceph-100/journal --pgid 32.10c --op export --file /root/32.10c.b.export
>>
>> This should be an import op and presumably to a different data path
>> and journal path more like the following?
>>
>> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-101
>> --journal-path /var/lib/ceph/osd/ceph-101/journal --pgid 32.10c --op
>> import --file /root/32.10c.b.export
>>
>> Just trying to clarify for anyone that comes across this thread in the future.
>>
>> Cheers,
>> Brad
>>
>>>
>>> 5. remove the PG from all other OSD's  (16, 143, 214, and 448 in your case it looks like)
>>> 6. Start cluster OSD's
>>> 7. Start the temporary OSD's and ensure 32.10c backfills correctly to the 3 OSD's it is supposed to be on.
>>>
>>> This is similar to the recovery process described in this post from 04/09/2015: http://ceph-users.ceph.narkive.com/lwDkR2fZ/recovering-incomplete-pgs-with-ceph-objectstore-tool   Hopefully it works in your case too and you can the cluster back to a state that you can make the CephFS directories smaller.
>>>
>>> - Brandon
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com