Would it be beneficial for anyone to have an archive copy of an osd that took more than 4 days to export. All but an hour of that time was spent exporting 1 pg (that ended up being 197MB). I can even send along the extracted pg for analysis... -- Adam On Fri, Jun 3, 2016 at 2:39 PM, Adam Tygart <mozes@xxxxxxx> wrote: > With regards to this export/import process, I've been exporting a pg > from an osd for more than 24 hours now. The entire OSD only has 8.6GB > of data. 3GB of that is in omap. The export for this particular PG is > only 108MB in size right now, after more than 24 hours. How is it > possible that a fragmented database on an ssd capable of 13,000 iops > can be this slow? > > -- > Adam > > On Fri, Jun 3, 2016 at 11:11 AM, Brandon Morris, PMP > <brandon.morris.pmp@xxxxxxxxx> wrote: >> Nice catch. That was a copy-paste error. Sorry >> >> it should have read: >> >> 3. Flush the journal and export the primary version of the PG. This took 1 >> minute on a well-behaved PG and 4 hours on the misbehaving PG >> i.e. ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-16 >> --journal-path /var/lib/ceph/osd/ceph-16/journal --pgid 32.10c --op export >> --file /root/32.10c.b.export >> >> 4. Import the PG into a New / Temporary OSD that is also offline, >> i.e. ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-100 >> --journal-path /var/lib/ceph/osd/ceph-100/journal --pgid 32.10c --op import >> --file /root/32.10c.b.export >> >> >> On Thu, Jun 2, 2016 at 5:10 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote: >>> >>> On Thu, Jun 2, 2016 at 9:07 AM, Brandon Morris, PMP >>> <brandon.morris.pmp@xxxxxxxxx> wrote: >>> >>> > The only way that I was able to get back to Health_OK was to >>> > export/import. ***** Please note, any time you use the >>> > ceph_objectstore_tool you risk data loss if not done carefully. Never >>> > remove a PG until you have a known good export ***** >>> > >>> > Here are the steps I used: >>> > >>> > 1. set NOOUT, NO BACKFILL >>> > 2. Stop the OSD's that have the erroring PG >>> > 3. Flush the journal and export the primary version of the PG. This >>> > took 1 minute on a well-behaved PG and 4 hours on the misbehaving PG >>> > i.e. ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-16 >>> > --journal-path /var/lib/ceph/osd/ceph-16/journal --pgid 32.10c --op export >>> > --file /root/32.10c.b.export >>> > >>> > 4. Import the PG into a New / Temporary OSD that is also offline, >>> > i.e. ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-100 >>> > --journal-path /var/lib/ceph/osd/ceph-100/journal --pgid 32.10c --op export >>> > --file /root/32.10c.b.export >>> >>> This should be an import op and presumably to a different data path >>> and journal path more like the following? >>> >>> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-101 >>> --journal-path /var/lib/ceph/osd/ceph-101/journal --pgid 32.10c --op >>> import --file /root/32.10c.b.export >>> >>> Just trying to clarify for anyone that comes across this thread in the >>> future. >>> >>> Cheers, >>> Brad >>> >>> > >>> > 5. remove the PG from all other OSD's (16, 143, 214, and 448 in your >>> > case it looks like) >>> > 6. Start cluster OSD's >>> > 7. Start the temporary OSD's and ensure 32.10c backfills correctly to >>> > the 3 OSD's it is supposed to be on. >>> > >>> > This is similar to the recovery process described in this post from >>> > 04/09/2015: >>> > http://ceph-users.ceph.narkive.com/lwDkR2fZ/recovering-incomplete-pgs-with-ceph-objectstore-tool >>> > Hopefully it works in your case too and you can the cluster back to a state >>> > that you can make the CephFS directories smaller. >>> > >>> > - Brandon >> >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com