Re: ceph-fuse "Transport endpoint is not connected" on Jewel 10.2.2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I just want to confirm that the patch works in our environment.
Thanks!

On 08/30/2016 02:04 PM, Dennis Kramer (DBS) wrote:
> Awesome Goncalo, that is very helpful.
> 
> Cheers.
> 
> On 08/30/2016 01:21 PM, Goncalo Borges wrote:
>> Hi Dennis.
>>
>> That is the first issue we saw and has nothing to do with the amd processors (which only relates to the second issue we saw). So the fix in the patch
>>
>> https://github.com/ceph/ceph/pull/10027
>>
>> should work for you.
>>
>> In our case we went for the full compilation for our own specific reasons. But you should only need to recompile the ceph fuse client. If you want a temp solution while this is not fixed in jewel,  just deploy ceph-fuse using an infernalis client. That is how we did it during the 3 weeks we were debugging our issues. 
>>
>> Cheers
>> Goncalo
>>
>> ________________________________________
>> From: Dennis Kramer (DBS) [dennis@xxxxxxxxx]
>> Sent: 30 August 2016 20:59
>> To: Goncalo Borges; ceph-users@xxxxxxxxxxxxxx
>> Subject: Re:  ceph-fuse "Transport endpoint is not connected" on Jewel 10.2.2
>>
>> Hi Goncalo,
>>
>> Thank you for providing below info. I'm getting the exact same errors:
>> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>  1: (()+0x2ae88e) [0x5647a76f488e]
>>  2: (()+0x113d0) [0x7f7d14c393d0]
>>  3: (Client::get_root_ino()+0x10) [0x5647a75eb730]
>>  4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x175)
>> [0x5647a75e9595]
>>  5: (()+0x1a3eb1) [0x5647a75e9eb1]
>>  6: (()+0x14ef5) [0x7f7d15283ef5]
>>  7: (()+0x15679) [0x7f7d15284679]
>>  8: (()+0x11e38) [0x7f7d15280e38]
>>  9: (()+0x76fa) [0x7f7d14c2f6fa]
>>  10: (clone()+0x6d) [0x7f7d1351ab5d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>> After reading your thread I wasn't sure if your solution would work in
>> our environment, since we don't use the AMD procs you mentioned. Though
>> the segfaults are identical in debugging.
>>
>> Have you recompiled ceph completely for your cluster or just the MDS server?
>>
>>
>> On 08/25/2016 02:45 AM, Goncalo Borges wrote:
>>> Hi Dennis...
>>>
>>> We use ceph-fuse in 10.2.2 and we saw two main issues with it immediately after
>>> upgrading from Infernalis to Jewel.
>>>
>>> In our case, we are enabling ceph-fuse in a heavily used Linux cluster, and our
>>> users complained about the mount points becoming unavailable some time after
>>> their applications start up.
>>>
>>> First we saw
>>>
>>> https://github.com/ceph/ceph/pull/10027
>>>
>>> and once that was fixed, we saw
>>>
>>> http://tracker.ceph.com/issues/16610
>>>
>>>
>>> There is a long ML thread with the subject 'ceph-fuse segfaults ( jewel 10.2.2)'
>>> on the topic. At the end, RH staff proposed some patches which we applied (we
>>> recompile ceph ourselves) and which resolved the issues we saw.
>>>
>>> You should run ceph-fuse in debug mode to actually check what segfaults you may
>>> have, and if it is a similar problem. You can do that by mounting ceph-fuse with
>>> nohup and the '-d'. Something like:
>>>
>>> nohup ceph-fuse --id mount_user -k <path to you key> -m <mon ip>:6789 -d -r
>>> /cephfs /coepp/cephfs > /path/to/some/log 2>&1 &
>>>
>>> If you want an even bigger log level, you should set 'debug client = 20' in your
>>> /etc/ceph/ceph.conf before mounting.
>>>
>>>
>>> Cheers
>>> Goncalo
>>>
>>> On 08/24/2016 10:28 PM, Dennis Kramer (DT) wrote:
>>>> Hi all,
>>>>
>>>> Running ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) on
>>>> Ubuntu 16.04LTS.
>>>>
>>>> Currently I have the weirdest thing, I have a bunch of linux clients, mostly
>>>> debian based (Ubuntu/Mint). They all use version 10.2.2 of ceph-fuse. I'm
>>>> running cephfs since Hammer without any issues, but upgraded last week to
>>>> Jewel and now my clients get:
>>>> "Transport endpoint is not connected".
>>>>
>>>> It seems the error only arises when the client is using the GUI when they
>>>> browse through the ceph-fuse mount, some use nemo, some nautilus. The error
>>>> doesnt show up immediatly, sometimes the client can browse through the share
>>>> for some time before they are kicked out with the error.
>>>>
>>>> But when I strictly use the shell to browse the ceph-fuse mount in the CLI it
>>>> works without any issues, when I try to use the GUI browser on the same
>>>> client, the error shows and I get kicked out of the ceph-fuse mount until I
>>>> remount.
>>>>
>>>> Any suggestions?
>>>>
>>>> With regards,
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> --
>>> Goncalo Borges
>>> Research Computing
>>> ARC Centre of Excellence for Particle Physics at the Terascale
>>> School of Physics A28 | University of Sydney, NSW  2006
>>> T: +61 2 93511937
>>>
>>
>> --
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux