Re: cephfs snapshot format upgrade

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 11 Apr 2018 16:11:04 -0700

On Tue, Apr 10, 2018 at 8:50 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
> On Wed, Apr 11, 2018 at 3:34 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>> On Tue, Apr 10, 2018 at 5:54 AM, John Spray <jspray@xxxxxxxxxx> wrote:
>>> On Tue, Apr 10, 2018 at 1:44 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>>> Hello
>>>>
>>>> To simplify snapshot handling in multiple active mds setup, we changed
>>>> format of snaprealm in mimic dev.
>>>> https://github.com/ceph/ceph/pull/16779.
>>>>
>>>> The new version mds can handle old format snaprealm in single active
>>>> setup. It also can convert old format snaprealm to the new format when
>>>> snaprealm is modified. The problem is that new version mds can not
>>>> properly handle old format snaprealm in multiple active setup. It may
>>>> crash when it encounter old format snaprealm. For existing filesystem
>>>> with snapshots, upgrading mds to mimic seems to be no problem at first
>>>> glance. But if user later enables multiple active mds,  mds may
>>>> crashes continuously. No easy way to switch back to single acitve mds.
>>>>
>>>> I don't have clear idea how to handle this situation. I can think of a
>>>> few options.
>>>>
>>>> 1. Forbid multiple active before all old snapshots are deleted or
>>>> before all snaprealms are converted to new format. Format conversion
>>>> requires traversing while whole filesystem tree.  Not easy to
>>>> implement.
>>>
>>> This has been a general problem with metadata format changes: we can
>>> never know if all the metadata in a filesystem has been brought up to
>>> a particular version.  Scrubbing (where scrub does the updates) should
>>> be the answer, but we don't have the mechanism for recording/ensuring
>>> the scrub has really happened.
>>>
>>> Maybe we need the MDS to be able to report a complete whole-filesystem
>>> scrub to the monitor, and record a field like
>>> "latest_scrubbed_version" in FSMap, so that we can be sure that all
>>> the filesystem metadata has been brought up to a certain version
>>> before enabling certain features?  So we'd then have a
>>> "latest_scrubbed_version >= mimic" test before enabling multiple
>>> active daemons.
>>
>> Don't we have a (recursive!) last_scrub_[stamp|version] on all
>> directories? There's not (yet) a mechanism for associating that with
>> specific data versions like you describe here, but for a one-time
>> upgrade with unsupported features I don't think we need anything too
>> sophisticated.
>> -Greg
>>
> No, we don't.  Besides, normal recursive stats (record last update) does not
> work for this case. We need a recursive stat that tracks the oldest
> update on all
> directories..

Well, inode_t has a last_scrub_version and last_scrub_stamp member.
They're part of encoding version 13. My recollection is that a scrub
on a directory is only considered complete when all of its descendants
have scrubbed, but maybe I'm misremembering and we'll happily do one
at a time.

I think I saw elsewhere that dealing with the upgrades is now
in-progress though so I presume some other solution came to hand.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com