Re: cephfs - inconsistent nfs and samba directory listings

"Yan, Zheng" <zyan@xxxxxxxxxx> · Fri, 15 Jan 2016 08:15:08 +0800

> On Jan 15, 2016, at 08:01, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> 
> On Thu, Jan 14, 2016 at 3:46 PM, Mike Carlson <mike@xxxxxxxxxxxx> wrote:
>> Hey Zheng,
>> 
>> I've been in the #ceph irc channel all day about this.
>> 
>> We did that, we set max_mds back to 1, but, instead of stopping mds 1, we
>> did a "ceph mds rmfailed 1". Running ceph mds stop 1 produces:
>> 
>> # ceph mds stop 1
>> Error EEXIST: mds.1 not active (???)
>> 
>> 
>> Our mds in a state of resolve, and will not come back.
>> 
>> We then tried to roll back the mds map to the epoch just before we set
>> max_mds to 2, but that command crashes all but one of our monitors and never
>> completes
>> 
>> We do not know what to do at this point, if there was a way to get the mds
>> back up just so we could back it up, we're okay with rebuilding. We just
>> need the data back.
> 
> It's not clear to me how much you've screwed up your monitor cluster.
> If that's still alive, you should just need to set max mds to 2, turn
> on an mds daemon, and let it resolve. Then you can follow the steps
> Zheng outlined for reducing the number of nodes cleanly.
> (That assumes that your MDS state is healthy and that the reason for
> your mounts hanging was a problem elsewhere, like with directory
> fragmentation confusing NFS.)
> 
> If your monitor cluster is actually in trouble (ie, the crashing
> problem made it to disk), that's a whole other thing now. But I
> suspect/hope it didn't and you just need to shut down the client
> trying to do the setmap and then turn the monitors all back on.
> Meanwhile, please post a bug at tracker.ceph.com with the actual
> monitor commands you ran and as much of the backtrace/log as you can;
> we don't want to have commands which break the system! ;)
> -Greg

the problem is that he ran ‘ceph mds rmfailed 1’ and there is no command to undo this. I think we need a command “ceph mds addfailed rank’

Regards
Yan, Zheng

> 
>> 
>> Mike C
>> 
>> 
>> 
>> On Thu, Jan 14, 2016 at 3:33 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>> 
>>> On Fri, Jan 15, 2016 at 3:28 AM, Mike Carlson <mike@xxxxxxxxxxxx> wrote:
>>>> Thank you for the reply Zheng
>>>> 
>>>> We tried set mds bal frag to true, but the end result was less than
>>>> desirable. All nfs and smb clients could no longer browse the share,
>>>> they
>>>> would hang on a directory with anything more than a few hundred files.
>>>> 
>>>> We then tried to back out the active/active mds change, no luck,
>>>> stopping
>>>> one of the mds's (mds 1) prevented us from mounting the cephfs
>>>> filesystem
>>>> 
>>>> So we failed and removed the secondary MDS, and now our primary mds is
>>>> stuck
>>>> in a "resovle" state:
>>>> 
>>>> # ceph -s
>>>>    cluster cabd1728-2eca-4e18-a581-b4885364e5a4
>>>>     health HEALTH_WARN
>>>>            clock skew detected on mon.lts-mon
>>>>            mds cluster is degraded
>>>>            Monitor clock skew detected
>>>>     monmap e1: 4 mons at
>>>> 
>>>> {lts-mon=10.5.68.236:6789/0,lts-osd1=10.5.68.229:6789/0,lts-osd2=10.5.68.230:6789/0,lts-osd3=10.5.68.203:6789/0}
>>>>            election epoch 1282, quorum 0,1,2,3
>>>> lts-osd3,lts-osd1,lts-osd2,lts-mon
>>>>     mdsmap e7892: 1/2/1 up {0=lts-mon=up:resolve}
>>>>     osdmap e10183: 102 osds: 101 up, 101 in
>>>>      pgmap v6714309: 4192 pgs, 7 pools, 31748 GB data, 23494 kobjects
>>>>            96188 GB used, 273 TB / 367 TB avail
>>>>                4188 active+clean
>>>>                   4 active+clean+scrubbing+deep
>>>> 
>>>> Now we are really down for the count. We cannot get our MDS back up in
>>>> an
>>>> active state and none of our data is accessible.
>>> 
>>> you can't remove active mds this way, you need to:
>>> 
>>> 1. make sure all active mds are running
>>> 2. run 'ceph mds set max_mds 1'
>>> 3. run 'ceph mds stop 1'
>>> 
>>> step 3 changes the second mds's state to stopping. Wait a while, the
>>> second mds will go to standby state. Occasionally, the second MDS can
>>> stuck in stopping state. If it happens, restart all MDS, then repeat
>>> step 3.
>>> 
>>> Regards
>>> Yan, Zheng
>>> 
>>> 
>>> 
>>>> 
>>>> 
>>>> On Wed, Jan 13, 2016 at 7:05 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>>>> 
>>>>> On Thu, Jan 14, 2016 at 3:37 AM, Mike Carlson <mike@xxxxxxxxxxxx>
>>>>> wrote:
>>>>>> Hey Greg,
>>>>>> 
>>>>>> The inconsistent view is only over nfs/smb on top of our /ceph mount.
>>>>>> 
>>>>>> When I look directly on the /ceph mount (which is using the cephfs
>>>>>> kernel
>>>>>> module), everything looks fine
>>>>>> 
>>>>>> It is possible that this issue just went unnoticed, and it only being
>>>>>> a
>>>>>> infernalis problem is just a red herring. With that, it is oddly
>>>>>> coincidental that we just started seeing issues.
>>>>> 
>>>>> This seems like seekdir bugs in kernel client, could you try 4.0+
>>>>> kernel.
>>>>> 
>>>>> Besides, do you enable "mds bal frag" for ceph-mds
>>>>> 
>>>>> 
>>>>> Regards
>>>>> Yan, Zheng
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> On Wed, Jan 13, 2016 at 11:30 AM, Gregory Farnum <gfarnum@xxxxxxxxxx>
>>>>>> wrote:
>>>>>>> 
>>>>>>> On Wed, Jan 13, 2016 at 11:24 AM, Mike Carlson <mike@xxxxxxxxxxxx>
>>>>>>> wrote:
>>>>>>>> Hello.
>>>>>>>> 
>>>>>>>> Since we upgraded to Infernalis last, we have noticed a severe
>>>>>>>> problem
>>>>>>>> with
>>>>>>>> cephfs when we have it shared over Samba and NFS
>>>>>>>> 
>>>>>>>> Directory listings are showing an inconsistent view of the files:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> $ ls /lts-mon/BD/xmlExport/ | wc -l
>>>>>>>>     100
>>>>>>>> $ sudo umount /lts-mon
>>>>>>>> $ sudo mount /lts-mon
>>>>>>>> $ ls /lts-mon/BD/xmlExport/ | wc -l
>>>>>>>>    3507
>>>>>>>> 
>>>>>>>> 
>>>>>>>> The only work around I have found is un-mounting and re-mounting
>>>>>>>> the
>>>>>>>> nfs
>>>>>>>> share, that seems to clear it up
>>>>>>>> Same with samba, I'd post it here but its thousands of lines. I
>>>>>>>> can
>>>>>>>> add
>>>>>>>> additional details on request.
>>>>>>>> 
>>>>>>>> This happened after our upgrade to infernalis. Is it possible the
>>>>>>>> MDS
>>>>>>>> is
>>>>>>>> in
>>>>>>>> an inconsistent state?
>>>>>>> 
>>>>>>> So this didn't happen to you until after you upgraded? Are you
>>>>>>> seeing
>>>>>>> missing files when looking at cephfs directly, or only over the
>>>>>>> NFS/Samba re-exports? Are you also sharing Samba by re-exporting the
>>>>>>> kernel cephfs mount?
>>>>>>> 
>>>>>>> Zheng, any ideas about kernel issues which might cause this or be
>>>>>>> more
>>>>>>> visible under infernalis?
>>>>>>> -Greg
>>>>>>> 
>>>>>>>> 
>>>>>>>> We have cephfs mounted on a server using the built in cephfs
>>>>>>>> kernel
>>>>>>>> module:
>>>>>>>> 
>>>>>>>> lts-mon:6789:/ /ceph ceph
>>>>>>>> name=admin,secretfile=/etc/ceph/admin.secret,noauto,_netdev
>>>>>>>> 
>>>>>>>> 
>>>>>>>> We are running all of our ceph nodes on ubuntu 14.04 LTS. Samba is
>>>>>>>> up
>>>>>>>> to
>>>>>>>> date, 4.1.6, and we export nfsv3 to linux and freebsd systems. All
>>>>>>>> seem
>>>>>>>> to
>>>>>>>> exhibit the same behavior.
>>>>>>>> 
>>>>>>>> system info:
>>>>>>>> 
>>>>>>>> # uname -a
>>>>>>>> Linux lts-osd1 3.13.0-63-generic #103-Ubuntu SMP Fri Aug 14
>>>>>>>> 21:42:59
>>>>>>>> UTC
>>>>>>>> 2015 x86_64 x86_64 x86_64 GNU/Linux
>>>>>>>> root@lts-osd1:~# lsb
>>>>>>>> lsblk        lsb_release
>>>>>>>> root@lts-osd1:~# lsb_release -a
>>>>>>>> No LSB modules are available.
>>>>>>>> Distributor ID: Ubuntu
>>>>>>>> Description: Ubuntu 14.04.3 LTS
>>>>>>>> Release: 14.04
>>>>>>>> Codename: trusty
>>>>>>>> 
>>>>>>>> 
>>>>>>>> package info:
>>>>>>>> 
>>>>>>>> # dpkg -l|grep ceph
>>>>>>>> ii  ceph                                 9.2.0-1trusty
>>>>>>>> amd64        distributed storage and file system
>>>>>>>> ii  ceph-common                          9.2.0-1trusty
>>>>>>>> amd64        common utilities to mount and interact with a ceph
>>>>>>>> storage
>>>>>>>> cluster
>>>>>>>> ii  ceph-fs-common                       9.2.0-1trusty
>>>>>>>> amd64        common utilities to mount and interact with a ceph
>>>>>>>> file
>>>>>>>> system
>>>>>>>> ii  ceph-mds                             9.2.0-1trusty
>>>>>>>> amd64        metadata server for the ceph distributed file system
>>>>>>>> ii  libcephfs1                           9.2.0-1trusty
>>>>>>>> amd64        Ceph distributed file system client library
>>>>>>>> ii  python-ceph                          9.2.0-1trusty
>>>>>>>> amd64        Meta-package for python libraries for the Ceph
>>>>>>>> libraries
>>>>>>>> ii  python-cephfs                        9.2.0-1trusty
>>>>>>>> amd64        Python libraries for the Ceph libcephfs library
>>>>>>>> 
>>>>>>>> 
>>>>>>>> What is interesting, is a directory or file will not show up in a
>>>>>>>> listing,
>>>>>>>> however, if we directly access the file, it shows up in that
>>>>>>>> instance:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> # ls -al |grep SCHOOL
>>>>>>>> # ls -alnd SCHOOL667055
>>>>>>>> drwxrwsr-x  1 21695  21183  2962751438 Jan 13 09:33 SCHOOL667055
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Any tips are appreciated!
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Mike C
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> ceph-users mailing list
>>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com