On Wed, Mar 29, 2017 at 12:59 AM, Brady Deetz <bdeetz@xxxxxxxxx> wrote: > That worked for us! > > Thank you very much for throwing that together in such a short time. > > How can I buy you a beer? Bitcoin? No problem, I appreciate the testing. John > > On Mar 28, 2017 4:13 PM, "John Spray" <jspray@xxxxxxxxxx> wrote: >> >> On Tue, Mar 28, 2017 at 8:44 PM, Brady Deetz <bdeetz@xxxxxxxxx> wrote: >> > Thanks John. Since we're on 10.2.5, the mds package has a dependency on >> > 10.2.6 >> > >> > Do you feel it is safe to perform a cluster upgrade to 10.2.6 in this >> > state? >> >> Yes, shouldn't be an issue to upgrade the whole system to 10.2.6 while >> you're at it. Just make a mental note that the "10.2.6-1.gdf5ca2d" is >> a different 10.2.6 than the official release. >> >> I forget how picky the dependencies are, if they demand the *exact* >> same version (including the trailing -1.gdf5ca2d) then I would just >> use the candidate fix version for all the packages on the node where >> you're running the MDS. >> >> John >> >> > [root@mds0 ceph-admin]# rpm -Uvh >> > ceph-mds-10.2.6-1.gdf5ca2d.el7.x86_64.rpm >> > error: Failed dependencies: >> > ceph-base = 1:10.2.6-1.gdf5ca2d.el7 is needed by >> > ceph-mds-1:10.2.6-1.gdf5ca2d.el7.x86_64 >> > ceph-mds = 1:10.2.5-0.el7 is needed by (installed) >> > ceph-1:10.2.5-0.el7.x86_64 >> > >> > >> > >> > On Tue, Mar 28, 2017 at 2:37 PM, John Spray <jspray@xxxxxxxxxx> wrote: >> >> >> >> On Tue, Mar 28, 2017 at 7:12 PM, Brady Deetz <bdeetz@xxxxxxxxx> wrote: >> >> > Thank you very much. I've located the directory that's layout is >> >> > against >> >> > that pool. I've dug around to attempt to create a pool with the same >> >> > ID >> >> > as >> >> > the deleted one, but for fairly obvious reasons, that doesn't seem to >> >> > exist. >> >> >> >> So there's a candidate fix on a branch called wip-19401-jewel, you can >> >> see builds here: >> >> >> >> >> >> https://shaman.ceph.com/repos/ceph/wip-19401-jewel/df5ca2d8e3f930ddae5708c50c6495c03b3dc078/ >> >> -- click through to one of those and do "repo url" to get to some >> >> built artifacts. >> >> >> >> Hopefully you're running one of centos 7, ubuntu xenial or ubuntu >> >> trusty, and therefore one of those builds will work for you (use the >> >> "default" variants rather than the "notcmalloc" variants) -- you >> >> should only need to pick out the ceph-mds package rather than >> >> upgrading everything. >> >> >> >> Cheers, >> >> John >> >> >> >> >> >> > On Tue, Mar 28, 2017 at 1:08 PM, John Spray <jspray@xxxxxxxxxx> >> >> > wrote: >> >> >> >> >> >> On Tue, Mar 28, 2017 at 6:45 PM, Brady Deetz <bdeetz@xxxxxxxxx> >> >> >> wrote: >> >> >> > If I follow the recommendations of this doc, do you suspect we >> >> >> > will >> >> >> > recover? >> >> >> > >> >> >> > http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/ >> >> >> >> >> >> You might, but it's overkill and introduces its own risks -- your >> >> >> metadata isn't really corrupt, you're just hitting a bug in the >> >> >> running code where it's overreacting. I'm writing a patch now. >> >> >> >> >> >> John >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > On Tue, Mar 28, 2017 at 12:37 PM, Brady Deetz <bdeetz@xxxxxxxxx> >> >> >> > wrote: >> >> >> >> >> >> >> >> I did do that. We were experimenting with an ec backed pool on >> >> >> >> the >> >> >> >> fs. >> >> >> >> It >> >> >> >> was stuck in an incomplete+creating state over night for only 128 >> >> >> >> pgs >> >> >> >> so I >> >> >> >> deleted the pool this morning. At the time of deletion, the only >> >> >> >> issue >> >> >> >> was >> >> >> >> the stuck 128 pgs. >> >> >> >> >> >> >> >> On Tue, Mar 28, 2017 at 12:29 PM, John Spray <jspray@xxxxxxxxxx> >> >> >> >> wrote: >> >> >> >>> >> >> >> >>> Did you at some point add a new data pool to the filesystem, and >> >> >> >>> then >> >> >> >>> remove the pool? With a little investigation I've found that >> >> >> >>> the >> >> >> >>> MDS >> >> >> >>> currently doesn't handle that properly: >> >> >> >>> http://tracker.ceph.com/issues/19401 >> >> >> >>> >> >> >> >>> John >> >> >> >>> >> >> >> >>> On Tue, Mar 28, 2017 at 6:11 PM, John Spray <jspray@xxxxxxxxxx> >> >> >> >>> wrote: >> >> >> >>> > On Tue, Mar 28, 2017 at 5:54 PM, Brady Deetz >> >> >> >>> > <bdeetz@xxxxxxxxx> >> >> >> >>> > wrote: >> >> >> >>> >> Running Jewel 10.2.5 on my production cephfs cluster and came >> >> >> >>> >> into >> >> >> >>> >> this ceph >> >> >> >>> >> status >> >> >> >>> >> >> >> >> >>> >> [ceph-admin@mds1 brady]$ ceph status >> >> >> >>> >> cluster 6f91f60c-7bc0-4aaa-a136-4a90851fbe10 >> >> >> >>> >> health HEALTH_WARN >> >> >> >>> >> mds0: Behind on trimming (2718/30) >> >> >> >>> >> mds0: MDS in read-only mode >> >> >> >>> >> monmap e17: 5 mons at >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> {mon0=10.124.103.60:6789/0,mon1=10.124.103.61:6789/0,mon2=10.124.103.62:6789/0,osd2=10.124.103.72:6789/0,osd3=10.124.103.73:6789/0} >> >> >> >>> >> election epoch 378, quorum 0,1,2,3,4 >> >> >> >>> >> mon0,mon1,mon2,osd2,osd3 >> >> >> >>> >> fsmap e6817: 1/1/1 up {0=mds0=up:active}, 1 up:standby >> >> >> >>> >> osdmap e172126: 235 osds: 235 up, 235 in >> >> >> >>> >> flags sortbitwise,require_jewel_osds >> >> >> >>> >> pgmap v18008949: 5696 pgs, 2 pools, 291 TB data, 112 >> >> >> >>> >> Mobjects >> >> >> >>> >> 874 TB used, 407 TB / 1282 TB avail >> >> >> >>> >> 5670 active+clean >> >> >> >>> >> 13 active+clean+scrubbing+deep >> >> >> >>> >> 13 active+clean+scrubbing >> >> >> >>> >> client io 760 B/s rd, 0 op/s rd, 0 op/s wr >> >> >> >>> >> >> >> >> >>> >> I've tried rebooting both mds servers. I've started a rolling >> >> >> >>> >> reboot >> >> >> >>> >> across >> >> >> >>> >> all of my osd nodes, but each node takes about 10 minutes >> >> >> >>> >> fully >> >> >> >>> >> rejoin. so >> >> >> >>> >> it's going to take a while. Any recommendations other than >> >> >> >>> >> reboot? >> >> >> >>> > >> >> >> >>> > As it says in the log, your MDSs are going read only because >> >> >> >>> > of >> >> >> >>> > errors >> >> >> >>> > writing to the OSDs: >> >> >> >>> > 2017-03-28 08:04:12.379747 7f25ed0af700 -1 >> >> >> >>> > log_channel(cluster) >> >> >> >>> > log >> >> >> >>> > [ERR] : failed to store backtrace on ino 10003a398a6 object, >> >> >> >>> > pool >> >> >> >>> > 20, >> >> >> >>> > errno -2 >> >> >> >>> > >> >> >> >>> > These messages are also scary and indicates that something has >> >> >> >>> > gone >> >> >> >>> > seriously wrong, either with the storage of the metadata or >> >> >> >>> > internally >> >> >> >>> > with the MDS: >> >> >> >>> > 2017-03-28 08:04:12.251543 7f25ef2b5700 -1 >> >> >> >>> > log_channel(cluster) >> >> >> >>> > log >> >> >> >>> > [ERR] : bad/negative dir size on 608 f(v9 m2017-03-28 >> >> >> >>> > 07:56:45.803267 >> >> >> >>> > -223=-221+-2) >> >> >> >>> > 2017-03-28 08:04:12.251564 7f25ef2b5700 -1 >> >> >> >>> > log_channel(cluster) >> >> >> >>> > log >> >> >> >>> > [ERR] : unmatched fragstat on 608, inode has f(v10 m2017-03-28 >> >> >> >>> > 07:56:45.803267 -223=-221+-2), dirfrags have f(v0 m2017-03-28 >> >> >> >>> > 07:56:45.803267) >> >> >> >>> > >> >> >> >>> > The case that I know of that causes ENOENT on object writes is >> >> >> >>> > when >> >> >> >>> > the pool no longer exists. You can set "debug objecter = 10" >> >> >> >>> > on >> >> >> >>> > the >> >> >> >>> > MDS and look for a message like "check_op_pool_dne tid >> >> >> >>> > <something> >> >> >> >>> > concluding pool <pool> dne". >> >> >> >>> > >> >> >> >>> > Otherwise, go look at the OSD logs from the timestamp where >> >> >> >>> > the >> >> >> >>> > failed >> >> >> >>> > write is happening to see if there's anything there. >> >> >> >>> > >> >> >> >>> > John >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> >> >> >> >> >>> >> Attached are my mds logs during the failure. >> >> >> >>> >> >> >> >> >>> >> Any ideas? >> >> >> >>> >> >> >> >> >>> >> _______________________________________________ >> >> >> >>> >> ceph-users mailing list >> >> >> >>> >> ceph-users@xxxxxxxxxxxxxx >> >> >> >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >>> >> >> >> >> >> >> >> >> >> >> >> >> > >> >> > >> >> > >> > >> > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com