Re: Write i/o in CephFS metadata pool

Samy Ascha <samy@xxxxxx> · Thu, 6 Feb 2020 11:04:21 +0100

> On 4 Feb 2020, at 16:14, Samy Ascha <samy@xxxxxx> wrote:
> 
> 
> 
>> On 2 Feb 2020, at 12:45, Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
>> 
>> On Wed, Jan 29, 2020 at 1:25 AM Samy Ascha <samy@xxxxxx> wrote:
>>> 
>>> Hi!
>>> 
>>> I've been running CephFS for a while now and ever since setting it up, I've seen unexpectedly large write i/o on the CephFS metadata pool.
>>> 
>>> The filesystem is otherwise stable and I'm seeing no usage issues.
>>> 
>>> I'm in a read-intensive environment, from the clients' perspective and throughput for the metadata pool is consistently larger than that of the data pool.
>>> 
>>> For example:
>>> 
>>> # ceph osd pool stats
>>> pool cephfs_data id 1
>>> client io 7.6 MiB/s rd, 19 KiB/s wr, 404 op/s rd, 1 op/s wr
>>> 
>>> pool cephfs_metadata id 2
>>> client io 338 KiB/s rd, 43 MiB/s wr, 84 op/s rd, 26 op/s wr
>>> 
>>> I realise, of course, that this is a momentary display of statistics, but I see this unbalanced r/w activity consistently when monitoring it live.
>>> 
>>> I would like some insight into what may be causing this large imbalance in r/w, especially since I'm in a read-intensive (web hosting) environment.
>> 
>> The MDS is still writing its journal and updating the "open file
>> table". The MDS needs to record certain information about the state of
>> its cache and the state issued to clients. Even if the clients aren't
>> changing anything. (This is workload dependent but will be most
>> obvious when clients are opening files _not_ in cache already.)
>> 
>> -- 
>> Patrick Donnelly, Ph.D.
>> He / Him / His
>> Senior Software Engineer
>> Red Hat Sunnyvale, CA
>> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>> 
> 
> Hi Patrick,
> 
> Thanks for this extra information.
> 
> I should be able to confirm this by checking network traffic flowing from the MDSes to the OSDs, and compare it to what's coming in from the CephFS clients.
> 
> I'll report back when I have more information on that. I'm a little caught up in other stuff right now, but I wanted to just acknowledge your message.
> 
> Samy
> 

Hi!

I've confirmed that the write IO to the metadata pool is coming form active MDSes.

I'm experiencing very poor write performance on clients and I would like to see if there's anything I can do to optimise the performance.

Right now, I'm specifically focussing on speeding up this use case:

In CephFS mounted dir:

$ time unzip -q wordpress-seo.12.9.1.zip 

real	0m47.596s
user	0m0.218s
sys	0m0.157s

On RBD mount:

$ time unzip -q wordpress-seo.12.9.1.zip 

real	0m0.176s
user	0m0.131s
sys	0m0.045s

The difference is just too big. I'm having real trouble finding a good reference to check my setup for bad configuration etc.

I have network bandwidth, RAM and CPU to spare, but I'm unsure on how to put it to work to help my case.

Thanks a lot,

Samy
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx