Hey Yan, Just confirming that creating fresh pools and doing the newfs on those fixed the problem, while restarting the OSDs didn't, thanks again! If you come up with a permanent fix, let me know and I'll test it for you. Regards, Oliver On wo, 2013-09-11 at 22:48 +0800, Yan, Zheng wrote: > On Wed, Sep 11, 2013 at 10:06 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote: > > Hey Yan, > > > > On 11-09-13 15:12, Yan, Zheng wrote: > >> On Wed, Sep 11, 2013 at 7:51 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote: > >>> Hey Gregory, > >>> > >>> I wiped and re-created the MDS-cluster I just mailed about, starting out > >>> by making sure CephFS is not mounted anywhere, stopping all MDSs, > >>> completely cleaning the "data" and "metadata"-pools using "rados > >>> --pool=<pool> cleanup <prefix>", then creating a new cluster using `ceph > >>> mds newfs 1 0 --yes-i-really-mean-it' and starting all MDSs again. > >>> Directly afterwards, I saw this: > >>> # rados --pool=metadata ls > >>> 1.00000000 > >>> 2.00000000 > >>> 200.00000000 > >>> 200.00000001 > >>> 600.00000000 > >>> 601.00000000 > >>> 602.00000000 > >>> 603.00000000 > >>> 605.00000000 > >>> 606.00000000 > >>> 608.00000000 > >>> 609.00000000 > >>> mds0_inotable > >>> mds0_sessionmap > >>> > >>> Note the missing objects, right from the start. I was able to mount the > >>> CephFS at this point, but after unmounting it and restarting the > >>> MDS-cluster, it failed to come up, with the same symptoms as before. I > >>> didn't place any files on CephFS at any point between newfs and failure. > >>> Naturally, I tried initializing it again, but now, even after more than > >>> 5 tries, the "mds*"-objects simply no longer show up in the > >>> "metadata"-pool at all. In fact, it remains empty. I can mount CephFS > >>> after the first start of the MDS-cluster after a newfs, but on restart, > >>> it fails because of the missing objects. Am I doing anything wrong > >>> while initializing the cluster, maybe? Is cleaning the pools and doing > >>> the newfs enough? I did the same on the other cluster yesterday and it > >>> seems to have all objects. > >>> > >> > >> Thank you for your default information. > >> > >> The cause of missing object is that the MDS IDs for old FS and new FS > >> are the same (incarnations are the same). When OSD receives MDS > >> requests for the newly created FS. It silently drops the requests, > >> because it thinks they are duplicated. You can get around the bug by > >> creating new pools for the newfs. > > > > Thanks for this very useful info, I think this solves the mystery! > > Could I get around it any other way? I'd rather not have to re-create > > the pools and switch to new pool-ID's every time I have to do this. > > Does the OSD store this info in it's meta-data, or might restarting the > > OSDs be enough? I'm quite sure that I re-created MDS-clusters on the > > same pools many times, without all the objects going missing. This was > > usually as part of tests, where I also restarted other > > cluster-components, like OSDs. This could explain why only some files > > went missing. If some OSDs are restarted and processed the requests, > > while others dropped the requests, it would appear as if some, but not > > all objects are missing. The problem then persists until the active MDS > > in the MDS-cluster is restarted, after which the missing objects get > > noticed, because things fail to restart. IMHO, this is a bug. Why > > Yes, it's a bug. Fixing it should be easy. > > > would the OSD ignore these requests, if the objects the MDS tries to > > write don't even exist at that time? > > > > OSD uses informartion in PG log to check duplicated requests, so > restarting OSD does not work. Another way to get around the bug is > generate lots of writes to the data/metadata pools, make sure each PG > trim old entries in its log. > > Regards > Yan, Zheng > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com