Re: cephfs - inconsistent nfs and samba directory listings

"Yan, Zheng" <zyan@xxxxxxxxxx> · Fri, 15 Jan 2016 08:46:06 +0800

Here is patch for v9.2.0.  After install the modified version of ceph-mon, run “ceph mds add failed 1”

Attachment:
mds_addfailed.patch

Description: Binary data
> On Jan 15, 2016, at 08:20, Mike Carlson <mike@xxxxxxxxxxxx> wrote:
> 
> okay, that sounds really good.
> 
> Would it help if you had access to our cluster?
> 
> On Thu, Jan 14, 2016 at 4:19 PM, Yan, Zheng <zyan@xxxxxxxxxx> wrote:
> 
> > On Jan 15, 2016, at 08:16, Mike Carlson <mike@xxxxxxxxxxxx> wrote:
> >
> > Did I just loose all of my data?
> >
> > If we were able to export the journal, could we create a brand new mds out of that and retrieve our data?
> 
> No. it’s early to fix. but you need to re-compile ceph-mon from source code. I’m writing the patch.
> 
> 
> 
> 
> >
> > On Thu, Jan 14, 2016 at 4:15 PM, Yan, Zheng <zyan@xxxxxxxxxx> wrote:
> >
> > > On Jan 15, 2016, at 08:01, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Jan 14, 2016 at 3:46 PM, Mike Carlson <mike@xxxxxxxxxxxx> wrote:
> > >> Hey Zheng,
> > >>
> > >> I've been in the #ceph irc channel all day about this.
> > >>
> > >> We did that, we set max_mds back to 1, but, instead of stopping mds 1, we
> > >> did a "ceph mds rmfailed 1". Running ceph mds stop 1 produces:
> > >>
> > >> # ceph mds stop 1
> > >> Error EEXIST: mds.1 not active (???)
> > >>
> > >>
> > >> Our mds in a state of resolve, and will not come back.
> > >>
> > >> We then tried to roll back the mds map to the epoch just before we set
> > >> max_mds to 2, but that command crashes all but one of our monitors and never
> > >> completes
> > >>
> > >> We do not know what to do at this point, if there was a way to get the mds
> > >> back up just so we could back it up, we're okay with rebuilding. We just
> > >> need the data back.
> > >
> > > It's not clear to me how much you've screwed up your monitor cluster.
> > > If that's still alive, you should just need to set max mds to 2, turn
> > > on an mds daemon, and let it resolve. Then you can follow the steps
> > > Zheng outlined for reducing the number of nodes cleanly.
> > > (That assumes that your MDS state is healthy and that the reason for
> > > your mounts hanging was a problem elsewhere, like with directory
> > > fragmentation confusing NFS.)
> > >
> > > If your monitor cluster is actually in trouble (ie, the crashing
> > > problem made it to disk), that's a whole other thing now. But I
> > > suspect/hope it didn't and you just need to shut down the client
> > > trying to do the setmap and then turn the monitors all back on.
> > > Meanwhile, please post a bug at tracker.ceph.com with the actual
> > > monitor commands you ran and as much of the backtrace/log as you can;
> > > we don't want to have commands which break the system! ;)
> > > -Greg
> >
> > the problem is that he ran ‘ceph mds rmfailed 1’ and there is no command to undo this. I think we need a command “ceph mds addfailed rank’
> >
> > Regards
> > Yan, Zheng
> >
> >
> > >
> > >>
> > >> Mike C
> > >>
> > >>
> > >>
> > >> On Thu, Jan 14, 2016 at 3:33 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
> > >>>
> > >>> On Fri, Jan 15, 2016 at 3:28 AM, Mike Carlson <mike@xxxxxxxxxxxx> wrote:
> > >>>> Thank you for the reply Zheng
> > >>>>
> > >>>> We tried set mds bal frag to true, but the end result was less than
> > >>>> desirable. All nfs and smb clients could no longer browse the share,
> > >>>> they
> > >>>> would hang on a directory with anything more than a few hundred files.
> > >>>>
> > >>>> We then tried to back out the active/active mds change, no luck,
> > >>>> stopping
> > >>>> one of the mds's (mds 1) prevented us from mounting the cephfs
> > >>>> filesystem
> > >>>>
> > >>>> So we failed and removed the secondary MDS, and now our primary mds is
> > >>>> stuck
> > >>>> in a "resovle" state:
> > >>>>
> > >>>> # ceph -s
> > >>>>    cluster cabd1728-2eca-4e18-a581-b4885364e5a4
> > >>>>     health HEALTH_WARN
> > >>>>            clock skew detected on mon.lts-mon
> > >>>>            mds cluster is degraded
> > >>>>            Monitor clock skew detected
> > >>>>     monmap e1: 4 mons at
> > >>>>
> > >>>> {lts-mon=10.5.68.236:6789/0,lts-osd1=10.5.68.229:6789/0,lts-osd2=10.5.68.230:6789/0,lts-osd3=10.5.68.203:6789/0}
> > >>>>            election epoch 1282, quorum 0,1,2,3
> > >>>> lts-osd3,lts-osd1,lts-osd2,lts-mon
> > >>>>     mdsmap e7892: 1/2/1 up {0=lts-mon=up:resolve}
> > >>>>     osdmap e10183: 102 osds: 101 up, 101 in
> > >>>>      pgmap v6714309: 4192 pgs, 7 pools, 31748 GB data, 23494 kobjects
> > >>>>            96188 GB used, 273 TB / 367 TB avail
> > >>>>                4188 active+clean
> > >>>>                   4 active+clean+scrubbing+deep
> > >>>>
> > >>>> Now we are really down for the count. We cannot get our MDS back up in
> > >>>> an
> > >>>> active state and none of our data is accessible.
> > >>>
> > >>> you can't remove active mds this way, you need to:
> > >>>
> > >>> 1. make sure all active mds are running
> > >>> 2. run 'ceph mds set max_mds 1'
> > >>> 3. run 'ceph mds stop 1'
> > >>>
> > >>> step 3 changes the second mds's state to stopping. Wait a while, the
> > >>> second mds will go to standby state. Occasionally, the second MDS can
> > >>> stuck in stopping state. If it happens, restart all MDS, then repeat
> > >>> step 3.
> > >>>
> > >>> Regards
> > >>> Yan, Zheng
> > >>>
> > >>>
> > >>>
> > >>>>
> > >>>>
> > >>>> On Wed, Jan 13, 2016 at 7:05 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
> > >>>>>
> > >>>>> On Thu, Jan 14, 2016 at 3:37 AM, Mike Carlson <mike@xxxxxxxxxxxx>
> > >>>>> wrote:
> > >>>>>> Hey Greg,
> > >>>>>>
> > >>>>>> The inconsistent view is only over nfs/smb on top of our /ceph mount.
> > >>>>>>
> > >>>>>> When I look directly on the /ceph mount (which is using the cephfs
> > >>>>>> kernel
> > >>>>>> module), everything looks fine
> > >>>>>>
> > >>>>>> It is possible that this issue just went unnoticed, and it only being
> > >>>>>> a
> > >>>>>> infernalis problem is just a red herring. With that, it is oddly
> > >>>>>> coincidental that we just started seeing issues.
> > >>>>>
> > >>>>> This seems like seekdir bugs in kernel client, could you try 4.0+
> > >>>>> kernel.
> > >>>>>
> > >>>>> Besides, do you enable "mds bal frag" for ceph-mds
> > >>>>>
> > >>>>>
> > >>>>> Regards
> > >>>>> Yan, Zheng
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>>
> > >>>>>> On Wed, Jan 13, 2016 at 11:30 AM, Gregory Farnum <gfarnum@xxxxxxxxxx>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>> On Wed, Jan 13, 2016 at 11:24 AM, Mike Carlson <mike@xxxxxxxxxxxx>
> > >>>>>>> wrote:
> > >>>>>>>> Hello.
> > >>>>>>>>
> > >>>>>>>> Since we upgraded to Infernalis last, we have noticed a severe
> > >>>>>>>> problem
> > >>>>>>>> with
> > >>>>>>>> cephfs when we have it shared over Samba and NFS
> > >>>>>>>>
> > >>>>>>>> Directory listings are showing an inconsistent view of the files:
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> $ ls /lts-mon/BD/xmlExport/ | wc -l
> > >>>>>>>>     100
> > >>>>>>>> $ sudo umount /lts-mon
> > >>>>>>>> $ sudo mount /lts-mon
> > >>>>>>>> $ ls /lts-mon/BD/xmlExport/ | wc -l
> > >>>>>>>>    3507
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> The only work around I have found is un-mounting and re-mounting
> > >>>>>>>> the
> > >>>>>>>> nfs
> > >>>>>>>> share, that seems to clear it up
> > >>>>>>>> Same with samba, I'd post it here but its thousands of lines. I
> > >>>>>>>> can
> > >>>>>>>> add
> > >>>>>>>> additional details on request.
> > >>>>>>>>
> > >>>>>>>> This happened after our upgrade to infernalis. Is it possible the
> > >>>>>>>> MDS
> > >>>>>>>> is
> > >>>>>>>> in
> > >>>>>>>> an inconsistent state?
> > >>>>>>>
> > >>>>>>> So this didn't happen to you until after you upgraded? Are you
> > >>>>>>> seeing
> > >>>>>>> missing files when looking at cephfs directly, or only over the
> > >>>>>>> NFS/Samba re-exports? Are you also sharing Samba by re-exporting the
> > >>>>>>> kernel cephfs mount?
> > >>>>>>>
> > >>>>>>> Zheng, any ideas about kernel issues which might cause this or be
> > >>>>>>> more
> > >>>>>>> visible under infernalis?
> > >>>>>>> -Greg
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>> We have cephfs mounted on a server using the built in cephfs
> > >>>>>>>> kernel
> > >>>>>>>> module:
> > >>>>>>>>
> > >>>>>>>> lts-mon:6789:/ /ceph ceph
> > >>>>>>>> name=admin,secretfile=/etc/ceph/admin.secret,noauto,_netdev
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> We are running all of our ceph nodes on ubuntu 14.04 LTS. Samba is
> > >>>>>>>> up
> > >>>>>>>> to
> > >>>>>>>> date, 4.1.6, and we export nfsv3 to linux and freebsd systems. All
> > >>>>>>>> seem
> > >>>>>>>> to
> > >>>>>>>> exhibit the same behavior.
> > >>>>>>>>
> > >>>>>>>> system info:
> > >>>>>>>>
> > >>>>>>>> # uname -a
> > >>>>>>>> Linux lts-osd1 3.13.0-63-generic #103-Ubuntu SMP Fri Aug 14
> > >>>>>>>> 21:42:59
> > >>>>>>>> UTC
> > >>>>>>>> 2015 x86_64 x86_64 x86_64 GNU/Linux
> > >>>>>>>> root@lts-osd1:~# lsb
> > >>>>>>>> lsblk        lsb_release
> > >>>>>>>> root@lts-osd1:~# lsb_release -a
> > >>>>>>>> No LSB modules are available.
> > >>>>>>>> Distributor ID: Ubuntu
> > >>>>>>>> Description: Ubuntu 14.04.3 LTS
> > >>>>>>>> Release: 14.04
> > >>>>>>>> Codename: trusty
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> package info:
> > >>>>>>>>
> > >>>>>>>> # dpkg -l|grep ceph
> > >>>>>>>> ii  ceph                                 9.2.0-1trusty
> > >>>>>>>> amd64        distributed storage and file system
> > >>>>>>>> ii  ceph-common                          9.2.0-1trusty
> > >>>>>>>> amd64        common utilities to mount and interact with a ceph
> > >>>>>>>> storage
> > >>>>>>>> cluster
> > >>>>>>>> ii  ceph-fs-common                       9.2.0-1trusty
> > >>>>>>>> amd64        common utilities to mount and interact with a ceph
> > >>>>>>>> file
> > >>>>>>>> system
> > >>>>>>>> ii  ceph-mds                             9.2.0-1trusty
> > >>>>>>>> amd64        metadata server for the ceph distributed file system
> > >>>>>>>> ii  libcephfs1                           9.2.0-1trusty
> > >>>>>>>> amd64        Ceph distributed file system client library
> > >>>>>>>> ii  python-ceph                          9.2.0-1trusty
> > >>>>>>>> amd64        Meta-package for python libraries for the Ceph
> > >>>>>>>> libraries
> > >>>>>>>> ii  python-cephfs                        9.2.0-1trusty
> > >>>>>>>> amd64        Python libraries for the Ceph libcephfs library
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> What is interesting, is a directory or file will not show up in a
> > >>>>>>>> listing,
> > >>>>>>>> however, if we directly access the file, it shows up in that
> > >>>>>>>> instance:
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> # ls -al |grep SCHOOL
> > >>>>>>>> # ls -alnd SCHOOL667055
> > >>>>>>>> drwxrwsr-x  1 21695  21183  2962751438 Jan 13 09:33 SCHOOL667055
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Any tips are appreciated!
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Mike C
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> _______________________________________________
> > >>>>>>>> ceph-users mailing list
> > >>>>>>>> ceph-users@xxxxxxxxxxxxxx
> > >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >>>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> _______________________________________________
> > >>>>>> ceph-users mailing list
> > >>>>>> ceph-users@xxxxxxxxxxxxxx
> > >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com