Re: CephFS hangs when writing 10GB files in loop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Dec 18, 2014, at 10:54 AM, Wido den Hollander <wido@xxxxxxxx> wrote:

> On 12/18/2014 11:13 AM, Wido den Hollander wrote:
>> On 12/17/2014 07:42 PM, Gregory Farnum wrote:
>>> On Wed, Dec 17, 2014 at 8:35 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
>>>> Hi,
>>>> 
>>>> Today I've been playing with CephFS and the morning started great with
>>>> CephFS playing along just fine.
>>>> 
>>>> Some information first:
>>>> - Ceph 0.89
>>>> - Linux kernel 3.18
>>>> - Ceph fuse 0.89
>>>> - One Active MDS, one Standby
>>>> 
>>>> This morning I could write a 10GB file like this using the kclient:
>>>> $ dd if=/dev/zero of=10GB.bin bs=1M count=10240 conv=fsync
>>>> 
>>>> That gave me 850MB/sec (all 10G network) and I could read the same file
>>>> again with 610MB/sec.
>>>> 
>>>> After writing to it multiple times it suddenly started to hang.
>>>> 
>>>> No real evidence on the MDS (debug mds set to 20) or anything on the
>>>> client. That specific operation just blocked, but I could still 'ls' the
>>>> filesystem in a second terminal.
>>>> 
>>>> The MDS was showing in it's log that it was checking active sessions of
>>>> clients. It showed the active session of my single client.
>>>> 
>>>> The client renewed it's caps and proceeded.
>>> 
>>> Can you clarify this? I'm not quite sure what you mean.
>>> 
>> 
>> I currently don't have the logs available. That was my problem when
>> typing the original e-mail.
>> 
>>>> I currently don't have any logs, but I'm just looking for a direction to
>>>> be pointed towards.
>>>> 
>>>> Any ideas?
>>> 
>>> Well, now that you're on v0.89 you should explore the admin
>>> socket...there are commands on the MDS to dump ops in flight (and
>>> maybe to look at session states? I don't remember when that merged).
>> 
>> Sage's pointer towards the kernel debugging and the new admin socket
>> showed me that it were RADOS calls which were hanging.
>> 
>> I investigated even further and it seems that this is not a CephFS
>> problem, but a local TCP issue which is only triggered when using CephFS.
>> 
>> At some point, which is still unclear to me, data transfer becomes very
>> slow. The MDS doesn't seem to be able to update the journal and the
>> client can't write to the OSDs anymore.
>> 
>> It happened after I did some very basic TCP tuning (timestamp, rmem,
>> wmem, sack, fastopen).
>> 
> 
> So it was tcp_sack. With tcp_sack=0 the MDS has problems talking to
> OSDs. Other clients still work fine, but the MDS couldn't replay it's
> journal and such.
> 
> Enabling tcp_sack again resolved the problem. The new admin socket
> really helped there!

What was the reasoning behind disabling SACK to begin with? Without it, any drops or reordering might require resending potentially a lot of data.

> 
>> Reverting back to the Ubuntu 14.04 defaults resolved it all and CephFS
>> is running happily now.
>> 
>> I'll dig some deeper to see why this system was affected by those
>> changes. I applied these settings earlier on a RBD-only cluster without
>> any problems.
>> 
>>> -Greg
>>> 
>> 
>> 
> 
> 
> -- 
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux