Re: cephfs - inconsistent nfs and samba directory listings

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Fri, 5 Feb 2016 11:42:25 +0100

Thanks for this thread. We just did the same mistake (rmfailed) on our
hammer cluster which broke it similarly. The addfailed patch worked
for us too.

-- Dan

On Fri, Jan 15, 2016 at 6:30 AM, Mike Carlson <mike@xxxxxxxxxxxx> wrote:
> Hey ceph-users,
>
> I wanted to follow up, Zheng's patch did the trick. We re-added the removed
> mds, and it all came back. We're sync-ing our data off to a backup server.
>
> Thanks for all of the help, Ceph has a great community to work with!
>
> Mike C
>
> On Thu, Jan 14, 2016 at 4:46 PM, Yan, Zheng <zyan@xxxxxxxxxx> wrote:
>>
>> Here is patch for v9.2.0.  After install the modified version of ceph-mon,
>> run “ceph mds add failed 1”
>>
>>
>>
>>
>>
>> > On Jan 15, 2016, at 08:20, Mike Carlson <mike@xxxxxxxxxxxx> wrote:
>> >
>> > okay, that sounds really good.
>> >
>> > Would it help if you had access to our cluster?
>> >
>> > On Thu, Jan 14, 2016 at 4:19 PM, Yan, Zheng <zyan@xxxxxxxxxx> wrote:
>> >
>> > > On Jan 15, 2016, at 08:16, Mike Carlson <mike@xxxxxxxxxxxx> wrote:
>> > >
>> > > Did I just loose all of my data?
>> > >
>> > > If we were able to export the journal, could we create a brand new mds
>> > > out of that and retrieve our data?
>> >
>> > No. it’s early to fix. but you need to re-compile ceph-mon from source
>> > code. I’m writing the patch.
>> >
>> >
>> >
>> >
>> > >
>> > > On Thu, Jan 14, 2016 at 4:15 PM, Yan, Zheng <zyan@xxxxxxxxxx> wrote:
>> > >
>> > > > On Jan 15, 2016, at 08:01, Gregory Farnum <gfarnum@xxxxxxxxxx>
>> > > > wrote:
>> > > >
>> > > > On Thu, Jan 14, 2016 at 3:46 PM, Mike Carlson <mike@xxxxxxxxxxxx>
>> > > > wrote:
>> > > >> Hey Zheng,
>> > > >>
>> > > >> I've been in the #ceph irc channel all day about this.
>> > > >>
>> > > >> We did that, we set max_mds back to 1, but, instead of stopping mds
>> > > >> 1, we
>> > > >> did a "ceph mds rmfailed 1". Running ceph mds stop 1 produces:
>> > > >>
>> > > >> # ceph mds stop 1
>> > > >> Error EEXIST: mds.1 not active (???)
>> > > >>
>> > > >>
>> > > >> Our mds in a state of resolve, and will not come back.
>> > > >>
>> > > >> We then tried to roll back the mds map to the epoch just before we
>> > > >> set
>> > > >> max_mds to 2, but that command crashes all but one of our monitors
>> > > >> and never
>> > > >> completes
>> > > >>
>> > > >> We do not know what to do at this point, if there was a way to get
>> > > >> the mds
>> > > >> back up just so we could back it up, we're okay with rebuilding. We
>> > > >> just
>> > > >> need the data back.
>> > > >
>> > > > It's not clear to me how much you've screwed up your monitor
>> > > > cluster.
>> > > > If that's still alive, you should just need to set max mds to 2,
>> > > > turn
>> > > > on an mds daemon, and let it resolve. Then you can follow the steps
>> > > > Zheng outlined for reducing the number of nodes cleanly.
>> > > > (That assumes that your MDS state is healthy and that the reason for
>> > > > your mounts hanging was a problem elsewhere, like with directory
>> > > > fragmentation confusing NFS.)
>> > > >
>> > > > If your monitor cluster is actually in trouble (ie, the crashing
>> > > > problem made it to disk), that's a whole other thing now. But I
>> > > > suspect/hope it didn't and you just need to shut down the client
>> > > > trying to do the setmap and then turn the monitors all back on.
>> > > > Meanwhile, please post a bug at tracker.ceph.com with the actual
>> > > > monitor commands you ran and as much of the backtrace/log as you
>> > > > can;
>> > > > we don't want to have commands which break the system! ;)
>> > > > -Greg
>> > >
>> > > the problem is that he ran ‘ceph mds rmfailed 1’ and there is no
>> > > command to undo this. I think we need a command “ceph mds addfailed rank’
>> > >
>> > > Regards
>> > > Yan, Zheng
>> > >
>> > >
>> > > >
>> > > >>
>> > > >> Mike C
>> > > >>
>> > > >>
>> > > >>
>> > > >> On Thu, Jan 14, 2016 at 3:33 PM, Yan, Zheng <ukernel@xxxxxxxxx>
>> > > >> wrote:
>> > > >>>
>> > > >>> On Fri, Jan 15, 2016 at 3:28 AM, Mike Carlson <mike@xxxxxxxxxxxx>
>> > > >>> wrote:
>> > > >>>> Thank you for the reply Zheng
>> > > >>>>
>> > > >>>> We tried set mds bal frag to true, but the end result was less
>> > > >>>> than
>> > > >>>> desirable. All nfs and smb clients could no longer browse the
>> > > >>>> share,
>> > > >>>> they
>> > > >>>> would hang on a directory with anything more than a few hundred
>> > > >>>> files.
>> > > >>>>
>> > > >>>> We then tried to back out the active/active mds change, no luck,
>> > > >>>> stopping
>> > > >>>> one of the mds's (mds 1) prevented us from mounting the cephfs
>> > > >>>> filesystem
>> > > >>>>
>> > > >>>> So we failed and removed the secondary MDS, and now our primary
>> > > >>>> mds is
>> > > >>>> stuck
>> > > >>>> in a "resovle" state:
>> > > >>>>
>> > > >>>> # ceph -s
>> > > >>>>    cluster cabd1728-2eca-4e18-a581-b4885364e5a4
>> > > >>>>     health HEALTH_WARN
>> > > >>>>            clock skew detected on mon.lts-mon
>> > > >>>>            mds cluster is degraded
>> > > >>>>            Monitor clock skew detected
>> > > >>>>     monmap e1: 4 mons at
>> > > >>>>
>> > > >>>>
>> > > >>>> {lts-mon=10.5.68.236:6789/0,lts-osd1=10.5.68.229:6789/0,lts-osd2=10.5.68.230:6789/0,lts-osd3=10.5.68.203:6789/0}
>> > > >>>>            election epoch 1282, quorum 0,1,2,3
>> > > >>>> lts-osd3,lts-osd1,lts-osd2,lts-mon
>> > > >>>>     mdsmap e7892: 1/2/1 up {0=lts-mon=up:resolve}
>> > > >>>>     osdmap e10183: 102 osds: 101 up, 101 in
>> > > >>>>      pgmap v6714309: 4192 pgs, 7 pools, 31748 GB data, 23494
>> > > >>>> kobjects
>> > > >>>>            96188 GB used, 273 TB / 367 TB avail
>> > > >>>>                4188 active+clean
>> > > >>>>                   4 active+clean+scrubbing+deep
>> > > >>>>
>> > > >>>> Now we are really down for the count. We cannot get our MDS back
>> > > >>>> up in
>> > > >>>> an
>> > > >>>> active state and none of our data is accessible.
>> > > >>>
>> > > >>> you can't remove active mds this way, you need to:
>> > > >>>
>> > > >>> 1. make sure all active mds are running
>> > > >>> 2. run 'ceph mds set max_mds 1'
>> > > >>> 3. run 'ceph mds stop 1'
>> > > >>>
>> > > >>> step 3 changes the second mds's state to stopping. Wait a while,
>> > > >>> the
>> > > >>> second mds will go to standby state. Occasionally, the second MDS
>> > > >>> can
>> > > >>> stuck in stopping state. If it happens, restart all MDS, then
>> > > >>> repeat
>> > > >>> step 3.
>> > > >>>
>> > > >>> Regards
>> > > >>> Yan, Zheng
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>>
>> > > >>>>
>> > > >>>> On Wed, Jan 13, 2016 at 7:05 PM, Yan, Zheng <ukernel@xxxxxxxxx>
>> > > >>>> wrote:
>> > > >>>>>
>> > > >>>>> On Thu, Jan 14, 2016 at 3:37 AM, Mike Carlson
>> > > >>>>> <mike@xxxxxxxxxxxx>
>> > > >>>>> wrote:
>> > > >>>>>> Hey Greg,
>> > > >>>>>>
>> > > >>>>>> The inconsistent view is only over nfs/smb on top of our /ceph
>> > > >>>>>> mount.
>> > > >>>>>>
>> > > >>>>>> When I look directly on the /ceph mount (which is using the
>> > > >>>>>> cephfs
>> > > >>>>>> kernel
>> > > >>>>>> module), everything looks fine
>> > > >>>>>>
>> > > >>>>>> It is possible that this issue just went unnoticed, and it only
>> > > >>>>>> being
>> > > >>>>>> a
>> > > >>>>>> infernalis problem is just a red herring. With that, it is
>> > > >>>>>> oddly
>> > > >>>>>> coincidental that we just started seeing issues.
>> > > >>>>>
>> > > >>>>> This seems like seekdir bugs in kernel client, could you try
>> > > >>>>> 4.0+
>> > > >>>>> kernel.
>> > > >>>>>
>> > > >>>>> Besides, do you enable "mds bal frag" for ceph-mds
>> > > >>>>>
>> > > >>>>>
>> > > >>>>> Regards
>> > > >>>>> Yan, Zheng
>> > > >>>>>
>> > > >>>>>
>> > > >>>>>
>> > > >>>>>>
>> > > >>>>>> On Wed, Jan 13, 2016 at 11:30 AM, Gregory Farnum
>> > > >>>>>> <gfarnum@xxxxxxxxxx>
>> > > >>>>>> wrote:
>> > > >>>>>>>
>> > > >>>>>>> On Wed, Jan 13, 2016 at 11:24 AM, Mike Carlson
>> > > >>>>>>> <mike@xxxxxxxxxxxx>
>> > > >>>>>>> wrote:
>> > > >>>>>>>> Hello.
>> > > >>>>>>>>
>> > > >>>>>>>> Since we upgraded to Infernalis last, we have noticed a
>> > > >>>>>>>> severe
>> > > >>>>>>>> problem
>> > > >>>>>>>> with
>> > > >>>>>>>> cephfs when we have it shared over Samba and NFS
>> > > >>>>>>>>
>> > > >>>>>>>> Directory listings are showing an inconsistent view of the
>> > > >>>>>>>> files:
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>> $ ls /lts-mon/BD/xmlExport/ | wc -l
>> > > >>>>>>>>     100
>> > > >>>>>>>> $ sudo umount /lts-mon
>> > > >>>>>>>> $ sudo mount /lts-mon
>> > > >>>>>>>> $ ls /lts-mon/BD/xmlExport/ | wc -l
>> > > >>>>>>>>    3507
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>> The only work around I have found is un-mounting and
>> > > >>>>>>>> re-mounting
>> > > >>>>>>>> the
>> > > >>>>>>>> nfs
>> > > >>>>>>>> share, that seems to clear it up
>> > > >>>>>>>> Same with samba, I'd post it here but its thousands of lines.
>> > > >>>>>>>> I
>> > > >>>>>>>> can
>> > > >>>>>>>> add
>> > > >>>>>>>> additional details on request.
>> > > >>>>>>>>
>> > > >>>>>>>> This happened after our upgrade to infernalis. Is it possible
>> > > >>>>>>>> the
>> > > >>>>>>>> MDS
>> > > >>>>>>>> is
>> > > >>>>>>>> in
>> > > >>>>>>>> an inconsistent state?
>> > > >>>>>>>
>> > > >>>>>>> So this didn't happen to you until after you upgraded? Are you
>> > > >>>>>>> seeing
>> > > >>>>>>> missing files when looking at cephfs directly, or only over
>> > > >>>>>>> the
>> > > >>>>>>> NFS/Samba re-exports? Are you also sharing Samba by
>> > > >>>>>>> re-exporting the
>> > > >>>>>>> kernel cephfs mount?
>> > > >>>>>>>
>> > > >>>>>>> Zheng, any ideas about kernel issues which might cause this or
>> > > >>>>>>> be
>> > > >>>>>>> more
>> > > >>>>>>> visible under infernalis?
>> > > >>>>>>> -Greg
>> > > >>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>> We have cephfs mounted on a server using the built in cephfs
>> > > >>>>>>>> kernel
>> > > >>>>>>>> module:
>> > > >>>>>>>>
>> > > >>>>>>>> lts-mon:6789:/ /ceph ceph
>> > > >>>>>>>> name=admin,secretfile=/etc/ceph/admin.secret,noauto,_netdev
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>> We are running all of our ceph nodes on ubuntu 14.04 LTS.
>> > > >>>>>>>> Samba is
>> > > >>>>>>>> up
>> > > >>>>>>>> to
>> > > >>>>>>>> date, 4.1.6, and we export nfsv3 to linux and freebsd
>> > > >>>>>>>> systems. All
>> > > >>>>>>>> seem
>> > > >>>>>>>> to
>> > > >>>>>>>> exhibit the same behavior.
>> > > >>>>>>>>
>> > > >>>>>>>> system info:
>> > > >>>>>>>>
>> > > >>>>>>>> # uname -a
>> > > >>>>>>>> Linux lts-osd1 3.13.0-63-generic #103-Ubuntu SMP Fri Aug 14
>> > > >>>>>>>> 21:42:59
>> > > >>>>>>>> UTC
>> > > >>>>>>>> 2015 x86_64 x86_64 x86_64 GNU/Linux
>> > > >>>>>>>> root@lts-osd1:~# lsb
>> > > >>>>>>>> lsblk        lsb_release
>> > > >>>>>>>> root@lts-osd1:~# lsb_release -a
>> > > >>>>>>>> No LSB modules are available.
>> > > >>>>>>>> Distributor ID: Ubuntu
>> > > >>>>>>>> Description: Ubuntu 14.04.3 LTS
>> > > >>>>>>>> Release: 14.04
>> > > >>>>>>>> Codename: trusty
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>> package info:
>> > > >>>>>>>>
>> > > >>>>>>>> # dpkg -l|grep ceph
>> > > >>>>>>>> ii  ceph                                 9.2.0-1trusty
>> > > >>>>>>>> amd64        distributed storage and file system
>> > > >>>>>>>> ii  ceph-common                          9.2.0-1trusty
>> > > >>>>>>>> amd64        common utilities to mount and interact with a
>> > > >>>>>>>> ceph
>> > > >>>>>>>> storage
>> > > >>>>>>>> cluster
>> > > >>>>>>>> ii  ceph-fs-common                       9.2.0-1trusty
>> > > >>>>>>>> amd64        common utilities to mount and interact with a
>> > > >>>>>>>> ceph
>> > > >>>>>>>> file
>> > > >>>>>>>> system
>> > > >>>>>>>> ii  ceph-mds                             9.2.0-1trusty
>> > > >>>>>>>> amd64        metadata server for the ceph distributed file
>> > > >>>>>>>> system
>> > > >>>>>>>> ii  libcephfs1                           9.2.0-1trusty
>> > > >>>>>>>> amd64        Ceph distributed file system client library
>> > > >>>>>>>> ii  python-ceph                          9.2.0-1trusty
>> > > >>>>>>>> amd64        Meta-package for python libraries for the Ceph
>> > > >>>>>>>> libraries
>> > > >>>>>>>> ii  python-cephfs                        9.2.0-1trusty
>> > > >>>>>>>> amd64        Python libraries for the Ceph libcephfs library
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>> What is interesting, is a directory or file will not show up
>> > > >>>>>>>> in a
>> > > >>>>>>>> listing,
>> > > >>>>>>>> however, if we directly access the file, it shows up in that
>> > > >>>>>>>> instance:
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>> # ls -al |grep SCHOOL
>> > > >>>>>>>> # ls -alnd SCHOOL667055
>> > > >>>>>>>> drwxrwsr-x  1 21695  21183  2962751438 Jan 13 09:33
>> > > >>>>>>>> SCHOOL667055
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>> Any tips are appreciated!
>> > > >>>>>>>>
>> > > >>>>>>>> Thanks,
>> > > >>>>>>>> Mike C
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>> _______________________________________________
>> > > >>>>>>>> ceph-users mailing list
>> > > >>>>>>>> ceph-users@xxxxxxxxxxxxxx
>> > > >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > > >>>>>>>>
>> > > >>>>>>
>> > > >>>>>>
>> > > >>>>>>
>> > > >>>>>> _______________________________________________
>> > > >>>>>> ceph-users mailing list
>> > > >>>>>> ceph-users@xxxxxxxxxxxxxx
>> > > >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > >
>> > >
>> >
>> >
>>
>>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com