Re: cephfs - inconsistent nfs and samba directory listings

Mike Carlson <mike@xxxxxxxxxxxx> · Thu, 14 Jan 2016 16:20:20 -0800

okay, that sounds really good.

Would it help if you had access to our cluster?

On Thu, Jan 14, 2016 at 4:19 PM, Yan, Zheng <zyan@xxxxxxxxxx> wrote:

> On Jan 15, 2016, at 08:16, Mike Carlson <mike@xxxxxxxxxxxx> wrote:

>

> Did I just loose all of my data?

>

> If we were able to export the journal, could we create a brand new mds out of that and retrieve our data?

No. it’s early to fix. but you need to re-compile ceph-mon from source code. I’m writing the patch.

>

> On Thu, Jan 14, 2016 at 4:15 PM, Yan, Zheng <zyan@xxxxxxxxxx> wrote:

>

> > On Jan 15, 2016, at 08:01, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:

> >

> > On Thu, Jan 14, 2016 at 3:46 PM, Mike Carlson <mike@xxxxxxxxxxxx> wrote:

> >> Hey Zheng,

> >>

> >> I've been in the #ceph irc channel all day about this.

> >>

> >> We did that, we set max_mds back to 1, but, instead of stopping mds 1, we

> >> did a "ceph mds rmfailed 1". Running ceph mds stop 1 produces:

> >>

> >> # ceph mds stop 1

> >> Error EEXIST: mds.1 not active (???)

> >>

> >>

> >> Our mds in a state of resolve, and will not come back.

> >>

> >> We then tried to roll back the mds map to the epoch just before we set

> >> max_mds to 2, but that command crashes all but one of our monitors and never

> >> completes

> >>

> >> We do not know what to do at this point, if there was a way to get the mds

> >> back up just so we could back it up, we're okay with rebuilding. We just

> >> need the data back.

> >

> > It's not clear to me how much you've screwed up your monitor cluster.

> > If that's still alive, you should just need to set max mds to 2, turn

> > on an mds daemon, and let it resolve. Then you can follow the steps

> > Zheng outlined for reducing the number of nodes cleanly.

> > (That assumes that your MDS state is healthy and that the reason for

> > your mounts hanging was a problem elsewhere, like with directory

> > fragmentation confusing NFS.)

> >

> > If your monitor cluster is actually in trouble (ie, the crashing

> > problem made it to disk), that's a whole other thing now. But I

> > suspect/hope it didn't and you just need to shut down the client

> > trying to do the setmap and then turn the monitors all back on.

> > Meanwhile, please post a bug at tracker.ceph.com with the actual

> > monitor commands you ran and as much of the backtrace/log as you can;

> > we don't want to have commands which break the system! ;)

> > -Greg

>

> the problem is that he ran ‘ceph mds rmfailed 1’ and there is no command to undo this. I think we need a command “ceph mds addfailed rank’

>

> Regards

> Yan, Zheng

>

>

> >

> >>

> >> Mike C

> >>

> >>

> >>

> >> On Thu, Jan 14, 2016 at 3:33 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:

> >>>

> >>> On Fri, Jan 15, 2016 at 3:28 AM, Mike Carlson <mike@xxxxxxxxxxxx> wrote:

> >>>> Thank you for the reply Zheng

> >>>>

> >>>> We tried set mds bal frag to true, but the end result was less than

> >>>> desirable. All nfs and smb clients could no longer browse the share,

> >>>> they

> >>>> would hang on a directory with anything more than a few hundred files.

> >>>>

> >>>> We then tried to back out the active/active mds change, no luck,

> >>>> stopping

> >>>> one of the mds's (mds 1) prevented us from mounting the cephfs

> >>>> filesystem

> >>>>

> >>>> So we failed and removed the secondary MDS, and now our primary mds is

> >>>> stuck

> >>>> in a "resovle" state:

> >>>>

> >>>> # ceph -s

> >>>>    cluster cabd1728-2eca-4e18-a581-b4885364e5a4

> >>>>     health HEALTH_WARN

> >>>>            clock skew detected on mon.lts-mon

> >>>>            mds cluster is degraded

> >>>>            Monitor clock skew detected

> >>>>     monmap e1: 4 mons at

> >>>>

> >>>> {lts-mon=10.5.68.236:6789/0,lts-osd1=10.5.68.229:6789/0,lts-osd2=10.5.68.230:6789/0,lts-osd3=10.5.68.203:6789/0}

> >>>>            election epoch 1282, quorum 0,1,2,3

> >>>> lts-osd3,lts-osd1,lts-osd2,lts-mon

> >>>>     mdsmap e7892: 1/2/1 up {0=lts-mon=up:resolve}

> >>>>     osdmap e10183: 102 osds: 101 up, 101 in

> >>>>      pgmap v6714309: 4192 pgs, 7 pools, 31748 GB data, 23494 kobjects

> >>>>            96188 GB used, 273 TB / 367 TB avail

> >>>>                4188 active+clean

> >>>>                   4 active+clean+scrubbing+deep

> >>>>

> >>>> Now we are really down for the count. We cannot get our MDS back up in

> >>>> an

> >>>> active state and none of our data is accessible.

> >>>

> >>> you can't remove active mds this way, you need to:

> >>>

> >>> 1. make sure all active mds are running

> >>> 2. run 'ceph mds set max_mds 1'

> >>> 3. run 'ceph mds stop 1'

> >>>

> >>> step 3 changes the second mds's state to stopping. Wait a while, the

> >>> second mds will go to standby state. Occasionally, the second MDS can

> >>> stuck in stopping state. If it happens, restart all MDS, then repeat

> >>> step 3.

> >>>

> >>> Regards

> >>> Yan, Zheng

> >>>

> >>>

> >>>

> >>>>

> >>>>

> >>>> On Wed, Jan 13, 2016 at 7:05 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:

> >>>>>

> >>>>> On Thu, Jan 14, 2016 at 3:37 AM, Mike Carlson <mike@xxxxxxxxxxxx>

> >>>>> wrote:

> >>>>>> Hey Greg,

> >>>>>>

> >>>>>> The inconsistent view is only over nfs/smb on top of our /ceph mount.

> >>>>>>

> >>>>>> When I look directly on the /ceph mount (which is using the cephfs

> >>>>>> kernel

> >>>>>> module), everything looks fine

> >>>>>>

> >>>>>> It is possible that this issue just went unnoticed, and it only being

> >>>>>> a

> >>>>>> infernalis problem is just a red herring. With that, it is oddly

> >>>>>> coincidental that we just started seeing issues.

> >>>>>

> >>>>> This seems like seekdir bugs in kernel client, could you try 4.0+

> >>>>> kernel.

> >>>>>

> >>>>> Besides, do you enable "mds bal frag" for ceph-mds

> >>>>>

> >>>>>

> >>>>> Regards

> >>>>> Yan, Zheng

> >>>>>

> >>>>>

> >>>>>

> >>>>>>

> >>>>>> On Wed, Jan 13, 2016 at 11:30 AM, Gregory Farnum <gfarnum@xxxxxxxxxx>

> >>>>>> wrote:

> >>>>>>>

> >>>>>>> On Wed, Jan 13, 2016 at 11:24 AM, Mike Carlson <mike@xxxxxxxxxxxx>

> >>>>>>> wrote:

> >>>>>>>> Hello.

> >>>>>>>>

> >>>>>>>> Since we upgraded to Infernalis last, we have noticed a severe

> >>>>>>>> problem

> >>>>>>>> with

> >>>>>>>> cephfs when we have it shared over Samba and NFS

> >>>>>>>>

> >>>>>>>> Directory listings are showing an inconsistent view of the files:

> >>>>>>>>

> >>>>>>>>

> >>>>>>>> $ ls /lts-mon/BD/xmlExport/ | wc -l

> >>>>>>>>     100

> >>>>>>>> $ sudo umount /lts-mon

> >>>>>>>> $ sudo mount /lts-mon

> >>>>>>>> $ ls /lts-mon/BD/xmlExport/ | wc -l

> >>>>>>>>    3507

> >>>>>>>>

> >>>>>>>>

> >>>>>>>> The only work around I have found is un-mounting and re-mounting

> >>>>>>>> the

> >>>>>>>> nfs

> >>>>>>>> share, that seems to clear it up

> >>>>>>>> Same with samba, I'd post it here but its thousands of lines. I

> >>>>>>>> can

> >>>>>>>> add

> >>>>>>>> additional details on request.

> >>>>>>>>

> >>>>>>>> This happened after our upgrade to infernalis. Is it possible the

> >>>>>>>> MDS

> >>>>>>>> is

> >>>>>>>> in

> >>>>>>>> an inconsistent state?

> >>>>>>>

> >>>>>>> So this didn't happen to you until after you upgraded? Are you

> >>>>>>> seeing

> >>>>>>> missing files when looking at cephfs directly, or only over the

> >>>>>>> NFS/Samba re-exports? Are you also sharing Samba by re-exporting the

> >>>>>>> kernel cephfs mount?

> >>>>>>>

> >>>>>>> Zheng, any ideas about kernel issues which might cause this or be

> >>>>>>> more

> >>>>>>> visible under infernalis?

> >>>>>>> -Greg

> >>>>>>>

> >>>>>>>>

> >>>>>>>> We have cephfs mounted on a server using the built in cephfs

> >>>>>>>> kernel

> >>>>>>>> module:

> >>>>>>>>

> >>>>>>>> lts-mon:6789:/ /ceph ceph

> >>>>>>>> name=admin,secretfile=/etc/ceph/admin.secret,noauto,_netdev

> >>>>>>>>

> >>>>>>>>

> >>>>>>>> We are running all of our ceph nodes on ubuntu 14.04 LTS. Samba is

> >>>>>>>> up

> >>>>>>>> to

> >>>>>>>> date, 4.1.6, and we export nfsv3 to linux and freebsd systems. All

> >>>>>>>> seem

> >>>>>>>> to

> >>>>>>>> exhibit the same behavior.

> >>>>>>>>

> >>>>>>>> system info:

> >>>>>>>>

> >>>>>>>> # uname -a

> >>>>>>>> Linux lts-osd1 3.13.0-63-generic #103-Ubuntu SMP Fri Aug 14

> >>>>>>>> 21:42:59

> >>>>>>>> UTC

> >>>>>>>> 2015 x86_64 x86_64 x86_64 GNU/Linux

> >>>>>>>> root@lts-osd1:~# lsb

> >>>>>>>> lsblk        lsb_release

> >>>>>>>> root@lts-osd1:~# lsb_release -a

> >>>>>>>> No LSB modules are available.

> >>>>>>>> Distributor ID: Ubuntu

> >>>>>>>> Description: Ubuntu 14.04.3 LTS

> >>>>>>>> Release: 14.04

> >>>>>>>> Codename: trusty

> >>>>>>>>

> >>>>>>>>

> >>>>>>>> package info:

> >>>>>>>>

> >>>>>>>> # dpkg -l|grep ceph

> >>>>>>>> ii  ceph                                 9.2.0-1trusty

> >>>>>>>> amd64        distributed storage and file system

> >>>>>>>> ii  ceph-common                          9.2.0-1trusty

> >>>>>>>> amd64        common utilities to mount and interact with a ceph

> >>>>>>>> storage

> >>>>>>>> cluster

> >>>>>>>> ii  ceph-fs-common                       9.2.0-1trusty

> >>>>>>>> amd64        common utilities to mount and interact with a ceph

> >>>>>>>> file

> >>>>>>>> system

> >>>>>>>> ii  ceph-mds                             9.2.0-1trusty

> >>>>>>>> amd64        metadata server for the ceph distributed file system

> >>>>>>>> ii  libcephfs1                           9.2.0-1trusty

> >>>>>>>> amd64        Ceph distributed file system client library

> >>>>>>>> ii  python-ceph                          9.2.0-1trusty

> >>>>>>>> amd64        Meta-package for python libraries for the Ceph

> >>>>>>>> libraries

> >>>>>>>> ii  python-cephfs                        9.2.0-1trusty

> >>>>>>>> amd64        Python libraries for the Ceph libcephfs library

> >>>>>>>>

> >>>>>>>>

> >>>>>>>> What is interesting, is a directory or file will not show up in a

> >>>>>>>> listing,

> >>>>>>>> however, if we directly access the file, it shows up in that

> >>>>>>>> instance:

> >>>>>>>>

> >>>>>>>>

> >>>>>>>> # ls -al |grep SCHOOL

> >>>>>>>> # ls -alnd SCHOOL667055

> >>>>>>>> drwxrwsr-x  1 21695  21183  2962751438 Jan 13 09:33 SCHOOL667055

> >>>>>>>>

> >>>>>>>>

> >>>>>>>> Any tips are appreciated!

> >>>>>>>>

> >>>>>>>> Thanks,

> >>>>>>>> Mike C

> >>>>>>>>

> >>>>>>>>

> >>>>>>>> _______________________________________________

> >>>>>>>> ceph-users mailing list

> >>>>>>>> ceph-users@xxxxxxxxxxxxxx

> >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> >>>>>>>>

> >>>>>>

> >>>>>>

> >>>>>>

> >>>>>> _______________________________________________

> >>>>>> ceph-users mailing list

> >>>>>> ceph-users@xxxxxxxxxxxxxx

> >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com