Re: Cephfs slow, not busy, but doing high traffic in the metadata pool

Flemming Frandsen <dren.dk@xxxxxxxxx> · Thu, 8 Jul 2021 12:09:34 +0200

This output seems typical for both active MDS servers:

---------------mds---------------- --mds_cache--- ------mds_log------
-mds_mem- -------mds_server------- mds_ -----objecter------ purg
req  rlat fwd  inos caps exi  imi |stry recy recd|subm evts segs repl|ino  dn
 |hcr  hcs  hsr  cre  cat |sess|actv rd   wr   rdwr|purg|
 0    0    0  6.0M 887k 1.0k   0 | 56    0    0 |  7  3.0k 139    0 |6.0M
6.0M|  0  154    0    0    0 | 48 |  0    0   14    0 |  0
 0   11k   0  6.0M 887k 236    0 | 56    0    0 |  7  3.0k 142    0 |6.0M
6.0M|  0   99    0    0    0 | 48 |  1    0   31    0 |  0
 0    0    0  6.0M 887k 718    0 | 56    0    0 |  5  3.0k 143    0 |6.0M
6.0M|  0  318    0    0    0 | 48 |  1    0   12    0 |  0
 0   13k   0  6.0M 887k 3.4k   0 | 56    0    0 |197  3.2k 145    0 |6.0M
6.0M|  0   43    1    0    0 | 48 |  8    0  207    0 |  0
 0    0    0  6.0M 884k 4.9k   0 | 56    0    0 |  0  3.2k 145    0 |6.0M
6.0M|  0    2    0    0    0 | 48 |  0    0   10    0 |  0
 0    0    0  6.0M 884k 2.1k   0 | 56    0    0 |  6  3.2k 147    0 |6.0M
6.0M|  0    0    1    0    0 | 48 |  0    0   12    0 |  0
 2    0    0  6.0M 882k 1.1k   0 | 56    0    0 | 75  3.3k 150    0 |6.0M
6.0M|  2   23    0    0    0 | 48 |  0    0   42    0 |  0
 0    0    0  6.0M 880k  16    0 | 56    0    0 | 88  3.4k 152    0 |6.0M
6.0M|  0   48    0    0    0 | 48 |  3    0  115    0 |  0
 1  2.4k   0  6.0M 878k 126    0 | 56    0    0 |551  2.8k 130    0 |6.0M
6.0M|  1   26    2    0    0 | 48 |  0    0  209    0 |  0
 4  210    0  6.0M 874k   0    0 | 56    0    0 |  5  2.8k 131    0 |6.0M
6.0M|  4   14    0    0    0 | 48 |  0    0  488    0 |  0
 1  891    0  6.0M 870k  12k   0 | 56    0    0 |  0  2.8k 131    0 |6.0M
6.0M|  1   33    0    0    0 | 48 |  0    0    0    0 |  0
 5   15    2  6.0M 870k 8.2k   0 | 56    0    0 | 79  2.9k 134    0 |6.0M
6.0M|  5   27    1    0    0 | 48 |  0    0   22    0 |  0
 1   68    0  6.0M 858k   0    0 | 56    0    0 | 49  2.9k 136    0 |6.0M
6.0M|  1    0    1    0    0 | 48 |  0    0   91    0 |  0

The metadata pool is still taking 64 MB/s writes.

We have two active MDS servers, without pinning.

mds_cache_memory_limit is set to 20 GB, which ought to be enough for
anyone(tm) as only 24 GB of data is used in the metadata pool.

Does that offer any kind of clue?

On Thu, 8 Jul 2021 at 10:16, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:

> Hi,
>
> That's interesting -- yes on a lightly loaded cluster the metadata IO
> should be almost nil.
> You can debug what is happening using  ceph daemonperf on the active
> MDS, e.g. https://pastebin.com/raw/n0iD8zXY
>
> (Use a wide terminal to show all the columns).
>
> Normally, lots of md io would indicate that the cache size is too
> small for the workload; but since you said the clients are pretty
> idle, this might not be the case for you.
>
> Cheers, Dan
>
> On Thu, Jul 8, 2021 at 9:36 AM Flemming Frandsen <dren.dk@xxxxxxxxx>
> wrote:
> >
> > We have a nautilus cluster where any metadata write operation is very
> slow.
> >
> > We're seeing very light load from clients, as reported by dumping ops in
> > flight, often it's zero.
> >
> > We're also seeing about 100 MB/s writes to the metadata pool, constantly,
> > for weeks on end, which seems excessive, as only 22GB is utilized.
> >
> > Should the writes to the metadata pool not quiet down when there's
> nothing
> > going on?
> >
> > Is there any way i can get information about why the MDSes are thrashing
> so
> > badly?
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 
Flemming Frandsen - YAPH - http://osaa.dk - http://dren.dk/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx