Is there any way we could have a "leveldb_defrag_on_mount" option for the osds similar to the "leveldb_compact_on_mount" option? Also, I've got at least one user that is creating and deleting thousands of files at a time in some of their directories (keeping 1-2% of them). Could that cause this fragmentation that we think is the issue? -- Adam On Thu, Jun 2, 2016 at 10:32 PM, Adam Tygart <mozes@xxxxxxx> wrote: > I'm still exporting pgs out of some of the downed osds, but things are > definitely looking promising. > > Marginally related to this thread, as these seem to be most of the > hanging objects when exporting pgs, what are inodes in the 600 range > used for within the metadata pool? I know the 200 range is used for > journaling. 8 of the 13 osds I've got left down are currently trying > to export objects in the 600 range. Are these just MDS journal objects > from an mds severely behind on trimming? > > -- > Adam > > On Thu, Jun 2, 2016 at 6:10 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote: >> On Thu, Jun 2, 2016 at 9:07 AM, Brandon Morris, PMP >> <brandon.morris.pmp@xxxxxxxxx> wrote: >> >>> The only way that I was able to get back to Health_OK was to export/import. ***** Please note, any time you use the ceph_objectstore_tool you risk data loss if not done carefully. Never remove a PG until you have a known good export ***** >>> >>> Here are the steps I used: >>> >>> 1. set NOOUT, NO BACKFILL >>> 2. Stop the OSD's that have the erroring PG >>> 3. Flush the journal and export the primary version of the PG. This took 1 minute on a well-behaved PG and 4 hours on the misbehaving PG >>> i.e. ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-16 --journal-path /var/lib/ceph/osd/ceph-16/journal --pgid 32.10c --op export --file /root/32.10c.b.export >>> >>> 4. Import the PG into a New / Temporary OSD that is also offline, >>> i.e. ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-100 --journal-path /var/lib/ceph/osd/ceph-100/journal --pgid 32.10c --op export --file /root/32.10c.b.export >> >> This should be an import op and presumably to a different data path >> and journal path more like the following? >> >> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-101 >> --journal-path /var/lib/ceph/osd/ceph-101/journal --pgid 32.10c --op >> import --file /root/32.10c.b.export >> >> Just trying to clarify for anyone that comes across this thread in the future. >> >> Cheers, >> Brad >> >>> >>> 5. remove the PG from all other OSD's (16, 143, 214, and 448 in your case it looks like) >>> 6. Start cluster OSD's >>> 7. Start the temporary OSD's and ensure 32.10c backfills correctly to the 3 OSD's it is supposed to be on. >>> >>> This is similar to the recovery process described in this post from 04/09/2015: http://ceph-users.ceph.narkive.com/lwDkR2fZ/recovering-incomplete-pgs-with-ceph-objectstore-tool Hopefully it works in your case too and you can the cluster back to a state that you can make the CephFS directories smaller. >>> >>> - Brandon >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com