Re: Testing CephFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Sep 1, 2015, at 16:13, Simon Hallam <sha@xxxxxxxxx> wrote:
> 
> Hi Greg, Zheng,
> 
> Is this fixed in a later version of the kernel client? Or would it be wise for us to start using the fuse client?
> 
> Cheers,

I just wrote a fix https://github.com/ceph/ceph-client/commit/33b68dde7f27927a7cb1a7691e3c5b6f847ffd14 <https://github.com/ceph/ceph-client/commit/33b68dde7f27927a7cb1a7691e3c5b6f847ffd14>.  Yes, you should try ceps-fuse if this bug causes problems for you.

Regards
Yan, Zheng

> 
> Simon
> 
>> -----Original Message-----
>> From: Gregory Farnum [mailto:gfarnum@xxxxxxxxxx]
>> Sent: 31 August 2015 13:02
>> To: Yan, Zheng
>> Cc: Simon Hallam; Zheng Yan; ceph-users@xxxxxxxxxxxxxx
>> Subject: Re:  Testing CephFS
>> 
>> On Mon, Aug 31, 2015 at 12:16 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>> On Mon, Aug 24, 2015 at 6:38 PM, Gregory Farnum
>> <gfarnum@xxxxxxxxxx> wrote:
>>>> On Mon, Aug 24, 2015 at 11:35 AM, Simon  Hallam <sha@xxxxxxxxx>
>> wrote:
>>>>> Hi Greg,
>>>>> 
>>>>> The MDS' detect that the other one went down and started the replay.
>>>>> 
>>>>> I did some further testing with 20 client machines. Of the 20 client
>> machines, 5 hung with the following error:
>>>>> 
>>>>> [Aug24 10:53] ceph: mds0 caps stale
>>>>> [Aug24 10:54] ceph: mds0 caps stale
>>>>> [Aug24 10:58] ceph: mds0 hung
>>>>> [Aug24 11:03] ceph: mds0 came back
>>>>> [  +8.803334] libceph: mon2 10.15.0.3:6789 socket closed (con state
>> OPEN)
>>>>> [  +0.000018] libceph: mon2 10.15.0.3:6789 session lost, hunting for new
>> mon
>>>>> [Aug24 11:04] ceph: mds0 reconnect start
>>>>> [  +0.084938] libceph: mon2 10.15.0.3:6789 session established
>>>>> [  +0.008475] ceph: mds0 reconnect denied
>>>> 
>>>> Oh, this might be a kernel bug, failing to ask for mdsmap updates when
>>>> the connection goes away. Zheng, does that sound familiar?
>>>> -Greg
>>>> 
>>> 
>>> I reproduced this locally (use SIGSTOP to stop the monitor) . I think
>>> the root cause is that kernel client does not implement
>>> CEPH_FEATURE_MSGR_KEEPALIVE2. So the kernel client couldn't reliably
>>> detect the event that network cable got unplugged. It kept waiting for
>>> new events from the disconnected connection.
>> 
>> Yeah, the userspace client maintains an ongoing MDSMap subscription
>> from the monitors in order to hear about this. It puts more load on
>> the monitors but right now that's the solution we're going with: the
>> monitor times out the MDS, publishes a series of new maps (pushed to
>> the clients) in order to activate a standby, and the clients see that
>> they need to connect to the new MDS instance.
>> -Greg
> 
> 
> Please visit our new website at www.pml.ac.uk and follow us on Twitter  @PlymouthMarine
> 
> Winner of the Environment & Conservation category, the Charity Awards 2014.
> 
> Plymouth Marine Laboratory (PML) is a company limited by guarantee registered in England & Wales, company number 4178503. Registered Charity No. 1091222. Registered Office: Prospect Place, The Hoe, Plymouth  PL1 3DH, UK. 
> 
> This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. You are reminded that e-mail communications are not secure and may contain viruses; PML accepts no liability for any loss or damage which may be caused by viruses.
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux