Re: kernel cephfs - slow requests

Dzianis Kahanovich <mahatma@xxxxxxxxxxxxxx> · Mon, 11 Apr 2016 15:13:01 +0300

Dzianis Kahanovich пишет:
> Christian Balzer пишет:
> 
>>> New problem (unsure, but probably not observed in Hammer, but sure in
>>> Infernalis): copying large (tens g) files into kernel cephfs (from
>>> outside of cluster, iron - non-VM, preempt kernel) - make slow requests
>>> on some of OSDs (repeated range) - mostly 3 Gbps channels (slow).
>>>
>>> All OSDs default threads numbers. Scheduler=noop. size=3 min_size=2
>>>
>>> No same problem with fuse.
>>>
>>> Looks like broken or unbalanced congestion mechanism or I don't know how
>>> to moderate it. write_congestion_kb trying low (=1) - nothing
>>> interesting.
>>>
>> I think cause and effect are not quite what you think they are.
>>
>> Firstly let me state that I have no experience with CephFS at all, but
>> what you're seeing isn't likely related to it all.
>>
>> Next lets establish some parameters.
>> You're testing kernel and fuse from the same machine, right?
>> What is the write speed (throughput) when doing this with fuse compared to
>> the speed when doing this via the kernel module?
> 
> Now I add 2 and out 2 OSDs to 1 of 3 node (2T->4T), cluster under hardwork, so
> no benchmarks now. But I good understand this point. And after message I got
> slow request on fuse too.
> 
>> What is the top speed of your cluster when doing a 
>> "rados -p <yourpoolname> bench 60 write -t 32" from your test machine?
>> Does this result in slow requests as well?
> 
> Hmm... may be later. Now I have no rados pools, only RBD, DATA & METADATA.
> 
>> What I think is happening is that you're simply at the limits of your
>> current cluster and that fuse is slower, thus not exposing this.
>> The kernel module is likely fast AND also will use pagecache, thus creating
>> very large writes (how much memory does your test machine have) when it
>> gets flushed.
> 
> I bound all read/write values in kernel client more then fuse.
> 
> Mostly I understand - problem are fast write & slow HDDs. But IMHO some
> mechanisms must prevent it (congestion-like). And early I don't observe this
> problem on similar configs.
> 
> Later, if I will have more info, I say more. May be PREEMPT kernel is "wrong"
> there...
> 

After series of experiments (and multiple "slow requests" on OSDs add/remove and
backfills) I found solution (and create unrecoverable "inconsistent" PGs in data
pool outside real files - data pool re-created now, all OK). So, solution:
caps_wanted_delay_max=5 option on kernel mount.

-- 
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com