Re: CephFS recovery from missing metadata objects questions

John Spray <jspray@xxxxxxxxxx> · Wed, 7 Dec 2016 15:53:59 +0000

On Wed, Dec 7, 2016 at 3:46 PM, Wido den Hollander <wido@xxxxxxxx> wrote:
>
>> Op 7 december 2016 om 16:38 schreef John Spray <jspray@xxxxxxxxxx>:
>>
>>
>> On Wed, Dec 7, 2016 at 3:28 PM, Wido den Hollander <wido@xxxxxxxx> wrote:
>> > (I think John knows the answer, but sending to ceph-users for archival purposes)
>> >
>> > Hi John,
>> >
>> > A Ceph cluster lost a PG with CephFS metadata in there and it is currently doing a CephFS disaster recovery as described here: http://docs.ceph.com/docs/master/cephfs/disaster-recovery/
>>
>> I wonder if this has any relation to your thread about size=2 pools ;-)
>
> Yes, it does!
>
>>
>> > This data pool has 1.4B objects and currently has 16 concurrent scan_extents scans running:
>> >
>> > # cephfs-data-scan --debug-rados=10 scan_extents --worker_n 0 --worker_m 16 cephfs_metadata
>> > # cephfs-data-scan --debug-rados=10 scan_extents --worker_n 1 --worker_m 16 cephfs_metadata
>> > ..
>> > ..
>> > # cephfs-data-scan --debug-rados=10 scan_extents --worker_n 15 --worker_m 16 cephfs_metadata
>> >
>> > According to the source in DataScan.cc:
>> > * worker_n: Worker number
>> > * worker_m: Worker count
>> >
>> > So with the commands above I have 16 workers running, correct? For the scan_inodes I want to scale out to 32 workers to speed up the process even more.
>> >
>> > Just to double-check before I send a new PR to update the docs, this is the right way to run the tool, correct?
>>
>> It looks like you're targeting cephfs_metadata instead of your data pool.
>>
>> scan_extents and scan_inodes operate on data pools, even if your goal
>> is to rebuild your metadata pool (the argument is what you are
>> scanning, not what you are writing to).
>
> That was a typo of me when typing this e-mail. It is scanning the *data* pool at the moment.
>
> Can you confirm that the worker_n and worker_m arguments are the correct ones?

Yep, they look right to me.

>>
>> There is also a "scan_frags" command that operates on a metadata pool.
>
> Didn't know that. In this case the metadata pool is missing objects due to that lost PG.
>
> I think the scan_extents and scan_inodes on the *data* pool is the correct way to rebuild the metadata pool if it is missing objects, right?

In general you'd use both scan_frags (to re-link any orphaned
directories that might have been orphaned if they had an ancestor
dirfrag in the lost PG) and then scan_extents+scan_inodes (to re-link
any orphaned files that might have been orphaned because their
immediate parent dirfrag was in the lost PG).

However scan_extents+scan_inodes is generally doing the lion's share
of the work because anything that scan_frags would have caught would
probably also have appeared somewhere in a backtrace path and got
linked in by scan_inodes as a result, so you should probably just skip
scan_frags in this instance.

BTW, you've probably already realised this, but be *very* cautious
about using the recovered filesystem: our testing of these tools is
mostly verifying that after recovery we can see and read the files
(i.e. well enough to extract them somewhere else), not that the
filesystem is necessarily working well for writes etc after being
recovered.  If it's possible, then it's always better to recover your
files to a separate location, and then rebuild your filesystem with
fresh pools -- that way you're not risking that there as anything
strange left behind by the recovery process.

John

> Wido
>
>>
>> John
>>
>> > If not, before sending the PR and starting scan_inodes on this cluster, what is the correct way to invoke the tool?
>> >
>> > Thanks!
>> >
>> > Wido
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com