Re: cephfs - inconsistent nfs and samba directory listings

Gregory Farnum <gfarnum@xxxxxxxxxx> · Thu, 14 Jan 2016 16:01:19 -0800

On Thu, Jan 14, 2016 at 3:46 PM, Mike Carlson <mike@xxxxxxxxxxxx> wrote:
> Hey Zheng,
>
> I've been in the #ceph irc channel all day about this.
>
> We did that, we set max_mds back to 1, but, instead of stopping mds 1, we
> did a "ceph mds rmfailed 1". Running ceph mds stop 1 produces:
>
> # ceph mds stop 1
> Error EEXIST: mds.1 not active (???)
>
>
> Our mds in a state of resolve, and will not come back.
>
> We then tried to roll back the mds map to the epoch just before we set
> max_mds to 2, but that command crashes all but one of our monitors and never
> completes
>
> We do not know what to do at this point, if there was a way to get the mds
> back up just so we could back it up, we're okay with rebuilding. We just
> need the data back.

It's not clear to me how much you've screwed up your monitor cluster.
If that's still alive, you should just need to set max mds to 2, turn
on an mds daemon, and let it resolve. Then you can follow the steps
Zheng outlined for reducing the number of nodes cleanly.
(That assumes that your MDS state is healthy and that the reason for
your mounts hanging was a problem elsewhere, like with directory
fragmentation confusing NFS.)

If your monitor cluster is actually in trouble (ie, the crashing
problem made it to disk), that's a whole other thing now. But I
suspect/hope it didn't and you just need to shut down the client
trying to do the setmap and then turn the monitors all back on.
Meanwhile, please post a bug at tracker.ceph.com with the actual
monitor commands you ran and as much of the backtrace/log as you can;
we don't want to have commands which break the system! ;)
-Greg

>
> Mike C
>
>
>
> On Thu, Jan 14, 2016 at 3:33 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>
>> On Fri, Jan 15, 2016 at 3:28 AM, Mike Carlson <mike@xxxxxxxxxxxx> wrote:
>> > Thank you for the reply Zheng
>> >
>> > We tried set mds bal frag to true, but the end result was less than
>> > desirable. All nfs and smb clients could no longer browse the share,
>> > they
>> > would hang on a directory with anything more than a few hundred files.
>> >
>> > We then tried to back out the active/active mds change, no luck,
>> > stopping
>> > one of the mds's (mds 1) prevented us from mounting the cephfs
>> > filesystem
>> >
>> > So we failed and removed the secondary MDS, and now our primary mds is
>> > stuck
>> > in a "resovle" state:
>> >
>> > # ceph -s
>> >     cluster cabd1728-2eca-4e18-a581-b4885364e5a4
>> >      health HEALTH_WARN
>> >             clock skew detected on mon.lts-mon
>> >             mds cluster is degraded
>> >             Monitor clock skew detected
>> >      monmap e1: 4 mons at
>> >
>> > {lts-mon=10.5.68.236:6789/0,lts-osd1=10.5.68.229:6789/0,lts-osd2=10.5.68.230:6789/0,lts-osd3=10.5.68.203:6789/0}
>> >             election epoch 1282, quorum 0,1,2,3
>> > lts-osd3,lts-osd1,lts-osd2,lts-mon
>> >      mdsmap e7892: 1/2/1 up {0=lts-mon=up:resolve}
>> >      osdmap e10183: 102 osds: 101 up, 101 in
>> >       pgmap v6714309: 4192 pgs, 7 pools, 31748 GB data, 23494 kobjects
>> >             96188 GB used, 273 TB / 367 TB avail
>> >                 4188 active+clean
>> >                    4 active+clean+scrubbing+deep
>> >
>> > Now we are really down for the count. We cannot get our MDS back up in
>> > an
>> > active state and none of our data is accessible.
>>
>> you can't remove active mds this way, you need to:
>>
>> 1. make sure all active mds are running
>> 2. run 'ceph mds set max_mds 1'
>> 3. run 'ceph mds stop 1'
>>
>> step 3 changes the second mds's state to stopping. Wait a while, the
>> second mds will go to standby state. Occasionally, the second MDS can
>> stuck in stopping state. If it happens, restart all MDS, then repeat
>> step 3.
>>
>> Regards
>> Yan, Zheng
>>
>>
>>
>> >
>> >
>> > On Wed, Jan 13, 2016 at 7:05 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>> >>
>> >> On Thu, Jan 14, 2016 at 3:37 AM, Mike Carlson <mike@xxxxxxxxxxxx>
>> >> wrote:
>> >> > Hey Greg,
>> >> >
>> >> > The inconsistent view is only over nfs/smb on top of our /ceph mount.
>> >> >
>> >> > When I look directly on the /ceph mount (which is using the cephfs
>> >> > kernel
>> >> > module), everything looks fine
>> >> >
>> >> > It is possible that this issue just went unnoticed, and it only being
>> >> > a
>> >> > infernalis problem is just a red herring. With that, it is oddly
>> >> > coincidental that we just started seeing issues.
>> >>
>> >> This seems like seekdir bugs in kernel client, could you try 4.0+
>> >> kernel.
>> >>
>> >> Besides, do you enable "mds bal frag" for ceph-mds
>> >>
>> >>
>> >> Regards
>> >> Yan, Zheng
>> >>
>> >>
>> >>
>> >> >
>> >> > On Wed, Jan 13, 2016 at 11:30 AM, Gregory Farnum <gfarnum@xxxxxxxxxx>
>> >> > wrote:
>> >> >>
>> >> >> On Wed, Jan 13, 2016 at 11:24 AM, Mike Carlson <mike@xxxxxxxxxxxx>
>> >> >> wrote:
>> >> >> > Hello.
>> >> >> >
>> >> >> > Since we upgraded to Infernalis last, we have noticed a severe
>> >> >> > problem
>> >> >> > with
>> >> >> > cephfs when we have it shared over Samba and NFS
>> >> >> >
>> >> >> > Directory listings are showing an inconsistent view of the files:
>> >> >> >
>> >> >> >
>> >> >> > $ ls /lts-mon/BD/xmlExport/ | wc -l
>> >> >> >      100
>> >> >> > $ sudo umount /lts-mon
>> >> >> > $ sudo mount /lts-mon
>> >> >> > $ ls /lts-mon/BD/xmlExport/ | wc -l
>> >> >> >     3507
>> >> >> >
>> >> >> >
>> >> >> > The only work around I have found is un-mounting and re-mounting
>> >> >> > the
>> >> >> > nfs
>> >> >> > share, that seems to clear it up
>> >> >> > Same with samba, I'd post it here but its thousands of lines. I
>> >> >> > can
>> >> >> > add
>> >> >> > additional details on request.
>> >> >> >
>> >> >> > This happened after our upgrade to infernalis. Is it possible the
>> >> >> > MDS
>> >> >> > is
>> >> >> > in
>> >> >> > an inconsistent state?
>> >> >>
>> >> >> So this didn't happen to you until after you upgraded? Are you
>> >> >> seeing
>> >> >> missing files when looking at cephfs directly, or only over the
>> >> >> NFS/Samba re-exports? Are you also sharing Samba by re-exporting the
>> >> >> kernel cephfs mount?
>> >> >>
>> >> >> Zheng, any ideas about kernel issues which might cause this or be
>> >> >> more
>> >> >> visible under infernalis?
>> >> >> -Greg
>> >> >>
>> >> >> >
>> >> >> > We have cephfs mounted on a server using the built in cephfs
>> >> >> > kernel
>> >> >> > module:
>> >> >> >
>> >> >> > lts-mon:6789:/ /ceph ceph
>> >> >> > name=admin,secretfile=/etc/ceph/admin.secret,noauto,_netdev
>> >> >> >
>> >> >> >
>> >> >> > We are running all of our ceph nodes on ubuntu 14.04 LTS. Samba is
>> >> >> > up
>> >> >> > to
>> >> >> > date, 4.1.6, and we export nfsv3 to linux and freebsd systems. All
>> >> >> > seem
>> >> >> > to
>> >> >> > exhibit the same behavior.
>> >> >> >
>> >> >> > system info:
>> >> >> >
>> >> >> > # uname -a
>> >> >> > Linux lts-osd1 3.13.0-63-generic #103-Ubuntu SMP Fri Aug 14
>> >> >> > 21:42:59
>> >> >> > UTC
>> >> >> > 2015 x86_64 x86_64 x86_64 GNU/Linux
>> >> >> > root@lts-osd1:~# lsb
>> >> >> > lsblk        lsb_release
>> >> >> > root@lts-osd1:~# lsb_release -a
>> >> >> > No LSB modules are available.
>> >> >> > Distributor ID: Ubuntu
>> >> >> > Description: Ubuntu 14.04.3 LTS
>> >> >> > Release: 14.04
>> >> >> > Codename: trusty
>> >> >> >
>> >> >> >
>> >> >> > package info:
>> >> >> >
>> >> >> >  # dpkg -l|grep ceph
>> >> >> > ii  ceph                                 9.2.0-1trusty
>> >> >> > amd64        distributed storage and file system
>> >> >> > ii  ceph-common                          9.2.0-1trusty
>> >> >> > amd64        common utilities to mount and interact with a ceph
>> >> >> > storage
>> >> >> > cluster
>> >> >> > ii  ceph-fs-common                       9.2.0-1trusty
>> >> >> > amd64        common utilities to mount and interact with a ceph
>> >> >> > file
>> >> >> > system
>> >> >> > ii  ceph-mds                             9.2.0-1trusty
>> >> >> > amd64        metadata server for the ceph distributed file system
>> >> >> > ii  libcephfs1                           9.2.0-1trusty
>> >> >> > amd64        Ceph distributed file system client library
>> >> >> > ii  python-ceph                          9.2.0-1trusty
>> >> >> > amd64        Meta-package for python libraries for the Ceph
>> >> >> > libraries
>> >> >> > ii  python-cephfs                        9.2.0-1trusty
>> >> >> > amd64        Python libraries for the Ceph libcephfs library
>> >> >> >
>> >> >> >
>> >> >> > What is interesting, is a directory or file will not show up in a
>> >> >> > listing,
>> >> >> > however, if we directly access the file, it shows up in that
>> >> >> > instance:
>> >> >> >
>> >> >> >
>> >> >> > # ls -al |grep SCHOOL
>> >> >> > # ls -alnd SCHOOL667055
>> >> >> > drwxrwsr-x  1 21695  21183  2962751438 Jan 13 09:33 SCHOOL667055
>> >> >> >
>> >> >> >
>> >> >> > Any tips are appreciated!
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Mike C
>> >> >> >
>> >> >> >
>> >> >> > _______________________________________________
>> >> >> > ceph-users mailing list
>> >> >> > ceph-users@xxxxxxxxxxxxxx
>> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >> >
>> >> >
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > ceph-users mailing list
>> >> > ceph-users@xxxxxxxxxxxxxx
>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >
>> >
>> >
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com