Re: CephFS hangs when writing 10GB files in loop

Wido den Hollander <wido@xxxxxxxx> · Thu, 18 Dec 2014 16:54:29 +0100

On 12/18/2014 11:13 AM, Wido den Hollander wrote:
> On 12/17/2014 07:42 PM, Gregory Farnum wrote:
>> On Wed, Dec 17, 2014 at 8:35 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
>>> Hi,
>>>
>>> Today I've been playing with CephFS and the morning started great with
>>> CephFS playing along just fine.
>>>
>>> Some information first:
>>> - Ceph 0.89
>>> - Linux kernel 3.18
>>> - Ceph fuse 0.89
>>> - One Active MDS, one Standby
>>>
>>> This morning I could write a 10GB file like this using the kclient:
>>> $ dd if=/dev/zero of=10GB.bin bs=1M count=10240 conv=fsync
>>>
>>> That gave me 850MB/sec (all 10G network) and I could read the same file
>>> again with 610MB/sec.
>>>
>>> After writing to it multiple times it suddenly started to hang.
>>>
>>> No real evidence on the MDS (debug mds set to 20) or anything on the
>>> client. That specific operation just blocked, but I could still 'ls' the
>>> filesystem in a second terminal.
>>>
>>> The MDS was showing in it's log that it was checking active sessions of
>>> clients. It showed the active session of my single client.
>>>
>>> The client renewed it's caps and proceeded.
>>
>> Can you clarify this? I'm not quite sure what you mean.
>>
> 
> I currently don't have the logs available. That was my problem when
> typing the original e-mail.
> 
>>> I currently don't have any logs, but I'm just looking for a direction to
>>> be pointed towards.
>>>
>>> Any ideas?
>>
>> Well, now that you're on v0.89 you should explore the admin
>> socket...there are commands on the MDS to dump ops in flight (and
>> maybe to look at session states? I don't remember when that merged).
> 
> Sage's pointer towards the kernel debugging and the new admin socket
> showed me that it were RADOS calls which were hanging.
> 
> I investigated even further and it seems that this is not a CephFS
> problem, but a local TCP issue which is only triggered when using CephFS.
> 
> At some point, which is still unclear to me, data transfer becomes very
> slow. The MDS doesn't seem to be able to update the journal and the
> client can't write to the OSDs anymore.
> 
> It happened after I did some very basic TCP tuning (timestamp, rmem,
> wmem, sack, fastopen).
> 

So it was tcp_sack. With tcp_sack=0 the MDS has problems talking to
OSDs. Other clients still work fine, but the MDS couldn't replay it's
journal and such.

Enabling tcp_sack again resolved the problem. The new admin socket
really helped there!

> Reverting back to the Ubuntu 14.04 defaults resolved it all and CephFS
> is running happily now.
> 
> I'll dig some deeper to see why this system was affected by those
> changes. I applied these settings earlier on a RBD-only cluster without
> any problems.
> 
>> -Greg
>>
> 
> 

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html