Re: Cephfs slow, not busy, but doing high traffic in the metadata pool

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Thu, 8 Jul 2021 17:35:26 +0200

Indeed that looks pretty idle to me.
But if you have two active MDSs, then the load is probably caused by
the MD balancing continuously migrating subdirs back and forth between
each other in an effort to balance themselves -- we've seen this
several times in the past and is why we use pinning.
Each time a subtree is migrated from one mds to another, it needs to
re-read all the inodes from the metadata pool.

You can try increasing the `mds_bal_interval` from a default of 10s up
to ~300s or something like that and see if the metadata IO drops
significantly.
This should be changeable at runtime using `ceph config set mds
mds_bal_interval 300`.

Cheers, dan

On Thu, Jul 8, 2021 at 12:09 PM Flemming Frandsen <dren.dk@xxxxxxxxx> wrote:
>
> This output seems typical for both active MDS servers:
>
> ---------------mds---------------- --mds_cache--- ------mds_log------ -mds_mem- -------mds_server------- mds_ -----objecter------ purg
> req  rlat fwd  inos caps exi  imi |stry recy recd|subm evts segs repl|ino  dn  |hcr  hcs  hsr  cre  cat |sess|actv rd   wr   rdwr|purg|
>  0    0    0  6.0M 887k 1.0k   0 | 56    0    0 |  7  3.0k 139    0 |6.0M 6.0M|  0  154    0    0    0 | 48 |  0    0   14    0 |  0
>  0   11k   0  6.0M 887k 236    0 | 56    0    0 |  7  3.0k 142    0 |6.0M 6.0M|  0   99    0    0    0 | 48 |  1    0   31    0 |  0
>  0    0    0  6.0M 887k 718    0 | 56    0    0 |  5  3.0k 143    0 |6.0M 6.0M|  0  318    0    0    0 | 48 |  1    0   12    0 |  0
>  0   13k   0  6.0M 887k 3.4k   0 | 56    0    0 |197  3.2k 145    0 |6.0M 6.0M|  0   43    1    0    0 | 48 |  8    0  207    0 |  0
>  0    0    0  6.0M 884k 4.9k   0 | 56    0    0 |  0  3.2k 145    0 |6.0M 6.0M|  0    2    0    0    0 | 48 |  0    0   10    0 |  0
>  0    0    0  6.0M 884k 2.1k   0 | 56    0    0 |  6  3.2k 147    0 |6.0M 6.0M|  0    0    1    0    0 | 48 |  0    0   12    0 |  0
>  2    0    0  6.0M 882k 1.1k   0 | 56    0    0 | 75  3.3k 150    0 |6.0M 6.0M|  2   23    0    0    0 | 48 |  0    0   42    0 |  0
>  0    0    0  6.0M 880k  16    0 | 56    0    0 | 88  3.4k 152    0 |6.0M 6.0M|  0   48    0    0    0 | 48 |  3    0  115    0 |  0
>  1  2.4k   0  6.0M 878k 126    0 | 56    0    0 |551  2.8k 130    0 |6.0M 6.0M|  1   26    2    0    0 | 48 |  0    0  209    0 |  0
>  4  210    0  6.0M 874k   0    0 | 56    0    0 |  5  2.8k 131    0 |6.0M 6.0M|  4   14    0    0    0 | 48 |  0    0  488    0 |  0
>  1  891    0  6.0M 870k  12k   0 | 56    0    0 |  0  2.8k 131    0 |6.0M 6.0M|  1   33    0    0    0 | 48 |  0    0    0    0 |  0
>  5   15    2  6.0M 870k 8.2k   0 | 56    0    0 | 79  2.9k 134    0 |6.0M 6.0M|  5   27    1    0    0 | 48 |  0    0   22    0 |  0
>  1   68    0  6.0M 858k   0    0 | 56    0    0 | 49  2.9k 136    0 |6.0M 6.0M|  1    0    1    0    0 | 48 |  0    0   91    0 |  0
>
> The metadata pool is still taking 64 MB/s writes.
>
> We have two active MDS servers, without pinning.
>
> mds_cache_memory_limit is set to 20 GB, which ought to be enough for anyone(tm) as only 24 GB of data is used in the metadata pool.
>
> Does that offer any kind of clue?
>
> On Thu, 8 Jul 2021 at 10:16, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>>
>> Hi,
>>
>> That's interesting -- yes on a lightly loaded cluster the metadata IO
>> should be almost nil.
>> You can debug what is happening using  ceph daemonperf on the active
>> MDS, e.g. https://pastebin.com/raw/n0iD8zXY
>>
>> (Use a wide terminal to show all the columns).
>>
>> Normally, lots of md io would indicate that the cache size is too
>> small for the workload; but since you said the clients are pretty
>> idle, this might not be the case for you.
>>
>> Cheers, Dan
>>
>> On Thu, Jul 8, 2021 at 9:36 AM Flemming Frandsen <dren.dk@xxxxxxxxx> wrote:
>> >
>> > We have a nautilus cluster where any metadata write operation is very slow.
>> >
>> > We're seeing very light load from clients, as reported by dumping ops in
>> > flight, often it's zero.
>> >
>> > We're also seeing about 100 MB/s writes to the metadata pool, constantly,
>> > for weeks on end, which seems excessive, as only 22GB is utilized.
>> >
>> > Should the writes to the metadata pool not quiet down when there's nothing
>> > going on?
>> >
>> > Is there any way i can get information about why the MDSes are thrashing so
>> > badly?
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
> --
> Flemming Frandsen - YAPH - http://osaa.dk - http://dren.dk/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx