Re: CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

Oliver Daudey <oliver@xxxxxxxxx> · Wed, 11 Sep 2013 21:26:07 +0200

Hey Yan,

Just confirming that creating fresh pools and doing the newfs on those
fixed the problem, while restarting the OSDs didn't, thanks again!  If
you come up with a permanent fix, let me know and I'll test it for you.

   Regards,

      Oliver

On wo, 2013-09-11 at 22:48 +0800, Yan, Zheng wrote:
> On Wed, Sep 11, 2013 at 10:06 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
> > Hey Yan,
> >
> > On 11-09-13 15:12, Yan, Zheng wrote:
> >> On Wed, Sep 11, 2013 at 7:51 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote:
> >>> Hey Gregory,
> >>>
> >>> I wiped and re-created the MDS-cluster I just mailed about, starting out
> >>> by making sure CephFS is not mounted anywhere, stopping all MDSs,
> >>> completely cleaning the "data" and "metadata"-pools using "rados
> >>> --pool=<pool> cleanup <prefix>", then creating a new cluster using `ceph
> >>> mds newfs 1 0 --yes-i-really-mean-it' and starting all MDSs again.
> >>> Directly afterwards, I saw this:
> >>> # rados --pool=metadata ls
> >>> 1.00000000
> >>> 2.00000000
> >>> 200.00000000
> >>> 200.00000001
> >>> 600.00000000
> >>> 601.00000000
> >>> 602.00000000
> >>> 603.00000000
> >>> 605.00000000
> >>> 606.00000000
> >>> 608.00000000
> >>> 609.00000000
> >>> mds0_inotable
> >>> mds0_sessionmap
> >>>
> >>> Note the missing objects, right from the start.  I was able to mount the
> >>> CephFS at this point, but after unmounting it and restarting the
> >>> MDS-cluster, it failed to come up, with the same symptoms as before.  I
> >>> didn't place any files on CephFS at any point between newfs and failure.
> >>> Naturally, I tried initializing it again, but now, even after more than
> >>> 5 tries, the "mds*"-objects simply no longer show up in the
> >>> "metadata"-pool at all.  In fact, it remains empty.  I can mount CephFS
> >>> after the first start of the MDS-cluster after a newfs, but on restart,
> >>> it fails because of the missing objects.  Am I doing anything wrong
> >>> while initializing the cluster, maybe?  Is cleaning the pools and doing
> >>> the newfs enough?  I did the same on the other cluster yesterday and it
> >>> seems to have all objects.
> >>>
> >>
> >> Thank you for your default information.
> >>
> >> The cause of missing object is that the MDS IDs for old FS and new FS
> >> are the same (incarnations are the same). When OSD receives MDS
> >> requests for the newly created FS. It silently drops the requests,
> >> because it thinks they are duplicated.  You can get around the bug by
> >> creating new pools for the newfs.
> >
> > Thanks for this very useful info, I think this solves the mystery!
> > Could I get around it any other way?  I'd rather not have to re-create
> > the pools and switch to new pool-ID's every time I have to do this.
> > Does the OSD store this info in it's meta-data, or might restarting the
> > OSDs be enough?  I'm quite sure that I re-created MDS-clusters on the
> > same pools many times, without all the objects going missing.  This was
> > usually as part of tests, where I also restarted other
> > cluster-components, like OSDs.  This could explain why only some files
> > went missing.  If some OSDs are restarted and processed the requests,
> > while others dropped the requests, it would appear as if some, but not
> > all objects are missing.  The problem then persists until the active MDS
> > in the MDS-cluster is restarted, after which the missing objects get
> > noticed, because things fail to restart.  IMHO, this is a bug.  Why
> 
> Yes, it's a bug. Fixing it should be easy.
> 
> > would the OSD ignore these requests, if the objects the MDS tries to
> > write don't even exist at that time?
> >
> 
> OSD uses informartion in PG log to check duplicated requests, so
> restarting OSD does not work. Another way to get around the bug is
> generate lots of writes to the data/metadata pools, make sure each PG
> trim old entries in its log.
> 
> Regards
> Yan, Zheng
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com