Re: CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

Oliver Daudey <oliver@xxxxxxxxx> · Wed, 11 Sep 2013 16:06:49 +0200

Hey Yan,

On 11-09-13 15:12, Yan, Zheng wrote:
> On Wed, Sep 11, 2013 at 7:51 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
>> Hey Gregory,
>>
>> I wiped and re-created the MDS-cluster I just mailed about, starting out
>> by making sure CephFS is not mounted anywhere, stopping all MDSs,
>> completely cleaning the "data" and "metadata"-pools using "rados
>> --pool=<pool> cleanup <prefix>", then creating a new cluster using `ceph
>> mds newfs 1 0 --yes-i-really-mean-it' and starting all MDSs again.
>> Directly afterwards, I saw this:
>> # rados --pool=metadata ls
>> 1.00000000
>> 2.00000000
>> 200.00000000
>> 200.00000001
>> 600.00000000
>> 601.00000000
>> 602.00000000
>> 603.00000000
>> 605.00000000
>> 606.00000000
>> 608.00000000
>> 609.00000000
>> mds0_inotable
>> mds0_sessionmap
>>
>> Note the missing objects, right from the start.  I was able to mount the
>> CephFS at this point, but after unmounting it and restarting the
>> MDS-cluster, it failed to come up, with the same symptoms as before.  I
>> didn't place any files on CephFS at any point between newfs and failure.
>> Naturally, I tried initializing it again, but now, even after more than
>> 5 tries, the "mds*"-objects simply no longer show up in the
>> "metadata"-pool at all.  In fact, it remains empty.  I can mount CephFS
>> after the first start of the MDS-cluster after a newfs, but on restart,
>> it fails because of the missing objects.  Am I doing anything wrong
>> while initializing the cluster, maybe?  Is cleaning the pools and doing
>> the newfs enough?  I did the same on the other cluster yesterday and it
>> seems to have all objects.
>>
> 
> Thank you for your default information.
> 
> The cause of missing object is that the MDS IDs for old FS and new FS
> are the same (incarnations are the same). When OSD receives MDS
> requests for the newly created FS. It silently drops the requests,
> because it thinks they are duplicated.  You can get around the bug by
> creating new pools for the newfs.

Thanks for this very useful info, I think this solves the mystery!
Could I get around it any other way?  I'd rather not have to re-create
the pools and switch to new pool-ID's every time I have to do this.
Does the OSD store this info in it's meta-data, or might restarting the
OSDs be enough?  I'm quite sure that I re-created MDS-clusters on the
same pools many times, without all the objects going missing.  This was
usually as part of tests, where I also restarted other
cluster-components, like OSDs.  This could explain why only some files
went missing.  If some OSDs are restarted and processed the requests,
while others dropped the requests, it would appear as if some, but not
all objects are missing.  The problem then persists until the active MDS
in the MDS-cluster is restarted, after which the missing objects get
noticed, because things fail to restart.  IMHO, this is a bug.  Why
would the OSD ignore these requests, if the objects the MDS tries to
write don't even exist at that time?

   Regards,

      Oliver
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com