Re: CephFS trim_lru performance issue

Marvin Zhang <fanzier@xxxxxxxxx> · Tue, 27 Nov 2018 19:07:41 +0800



It's very easy reproduce this issue in my cluster. With single MDS and
single client , I prepare 1 million files in a folder. Then I read the
files one by one.
Here is the log with log level20:
2018-11-26 17:46:02.215734 7f40535fe700  7 src/mds/MDCache.cc:6456
mds.0.cache trim_lru trimming 0 items from LRU size=510845 mid=322756
pintail=4892 pinned=49765
......
2018-11-26 17:46:02.215745 7f40535fe700 12 src/mds/MDCache.cc:6644
mds.0.cache trim_dentry [dentry #0x1/tmp/000879266.txt [2,head] auth
(dversion lock) pv=0 v=116355 inode=0x7f401b2fc400 state=1073741824
0x7f4012c3e600]
2018-11-26 17:46:02.215758 7f40535fe700 12 src/mds/MDCache.cc:6653
mds.0.cache  in container [dir 0x1 / [2,head] auth v=4401399
cv=4173802/4173802 REP dir_auth=0 state=1610612738|complete f(v0
m2018-11-26 11:19:33.739629 6=1+5) n(v437 rc2018-11-26 11:25:19.121540
b392072138355 1833246=1833191+55) hs=6+0,ss=0+0 dirty=1 | child=1
subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=0 0x7f404bc4d400]
2018-11-26 17:46:02.215775 7f40535fe700 15 src/mds/MDCache.cc:6776
mds.0.cache trim_inode [inode 0x100000d6aa8 [2,head]
/tmp/000879266.txt auth v115474 s=65536 n(v0 b65536 1=1+0) (iversion
lock) 0x7f401b2fc400]
2018-11-26 17:46:02.215784 7f40535fe700 12 src/mds/CDir.cc:583
mds.0.cache.dir(0x10000000000.101110011*) unlink_inode [dentry
#0x1/tmp/000879266.txt [2,head] auth (dversion lock) pv=0 v=116355
inode=0x7f401b2fc400 state=1073741824 0x7f4012c3e600] [inode
0x100000d6aa8 [2,head] /tmp/000879266.txt auth v115474 s=65536 n(v0
b65536 1=1+0) (iversion lock) 0x7f401b2fc400]
2018-11-26 17:46:02.215792 7f40535fe700 14 src/mds/MDCache.cc:291
mds.0.cache remove_inode [inode 0x100000d6aa8 [2,head] #100000d6aa8
auth v115474 s=65536 n(v0 b65536 1=1+0) (iversion lock)
0x7f401b2fc400]
2018-11-26 17:46:02.215807 7f40535fe700 12 src/mds/CDir.cc:456
mds.0.cache.dir(0x10000000000.101110011*) remove_dentry [dentry
#0x1/tmp/000879266.txt [2,head] auth NULL (dversion lock) pv=0
v=116355 inode=0 state=1073741824 0x7f4012c3e600]
......
[marvin]
trim_dentry runs about 91555 times, it take about 2ms each time each
time in average ,they consume about 3s totally.
......
[marvin] The two trim_dentry below duration is the biggest, the
duration is about 30ms.
2018-11-26 17:46:02.369953 7f40535fe700 12 src/mds/MDCache.cc:6644
mds.0.cache trim_dentry [dentry #0x1/tmp/000443283.txt [2,head] auth
(dversion lock) pv=0 v=116541 inode=0x7f401db14f00 state=1073741824
0x7f40198a7200]
2018-11-26 17:46:02.369956 7f40535fe700 12 src/mds/MDCache.cc:6653
mds.0.cache  in container [dir 0x1 / [2,head] auth v=4401399
cv=4173802/4173802 REP dir_auth=0 state=1610612738|complete f(v0
m2018-11-26 11:19:33.739629 6=1+5) n(v437 rc2018-11-26 11:25:19.121540
b392072138355 1833246=1833191+55) hs=6+0,ss=0+0 dirty=1 | child=1
subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=0 0x7f404bc4d400]
2018-11-26 17:46:02.369961 7f40535fe700 15 src/mds/MDCache.cc:6776
mds.0.cache trim_inode [inode 0x1000006c397 [2,head]
/tmp/000443283.txt auth v101840 s=65536 n(v0 b65536 1=1+0) (iversion
lock) 0x7f401db14f00]
2018-11-26 17:46:02.369964 7f40535fe700 12 src/mds/CDir.cc:583
mds.0.cache.dir(0x10000000000.101110101*) unlink_inode [dentry
#0x1/tmp/000443283.txt [2,head] auth (dversion lock) pv=0 v=116541
inode=0x7f401db14f00 state=1073741824 0x7f40198a7200] [inode
0x1000006c397 [2,head] /tmp/000443283.txt auth v101840 s=65536 n(v0
b65536 1=1+0) (iversion lock) 0x7f401db14f00]
2018-11-26 17:46:02.369971 7f40535fe700 14 src/mds/MDCache.cc:291
mds.0.cache remove_inode [inode 0x1000006c397 [2,head] #1000006c397
auth v101840 s=65536 n(v0 b65536 1=1+0) (iversion lock)
0x7f401db14f00]
2018-11-26 17:46:02.369974 7f40535fe700 12 src/mds/CDir.cc:456
mds.0.cache.dir(0x10000000000.101110101*) remove_dentry [dentry
#0x1/tmp/000443283.txt [2,head] auth NULL (dversion lock) pv=0
v=116541 inode=0 state=1073741824 0x7f40198a7200]
2018-11-26 17:46:02.393052 7f40535fe700 12 src/mds/MDCache.cc:6644
mds.0.cache trim_dentry [dentry #0x1/tmp/000351925.txt [2,head] auth
(dversion lock) pv=0 v=116541 inode=0x7f401dba3f00 state=1073741824
0x7f401991d200]
......
2018-11-26 17:46:05.612477 7f40535fe700  7 src/mds/MDCache.cc:6502
mds.0.cache trim_lru trimmed 91555 items
......
2018-11-26 17:46:05.612722 7f40535fe700  2 src/mds/MDCache.cc:7460
mds.0.cache check_memory_usage total 10990068, rss 1629564, heap
10930576, baseline 559504, buffers 0, 20067 / 419292 inodes have caps,
20117 caps, 0.0479785 caps per inode
2018-11-26 17:46:05.612744 7f40535fe700 10 src/mds/MDLog.cc:589
mds.0.log trim 132 / 128 segments, 96963 / -1 events, 0 (0) expiring,
3 (2200) expired
2018-11-26 17:46:05.612749 7f40535fe700  5 src/mds/MDLog.cc:643
mds.0.log trim already expired segment 4864202/9082625714, 1024 events
2018-11-26 17:46:05.612752 7f40535fe700  5 src/mds/MDLog.cc:643
mds.0.log trim already expired segment 4865226/9084582156, 152 events
2018-11-26 17:46:05.612757 7f40535fe700  5 src/mds/MDLog.cc:643
mds.0.log trim already expired segment 4865378/9084873509, 1024 events
On Tue, Nov 27, 2018 at 12:28 AM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>
> Okay, that sounds like something else is happening then. You might try
> turning up the debug to 20 and seeing what is happening during those 4
> seconds, as that's definitely more time than it should take.
> -Greg
>
>
> On Mon, Nov 26, 2018 at 10:37 AM Marvin Zhang <fanzier@xxxxxxxxx> wrote:
> >
> > I tested it in 12.2.8, and the debug_mds log level is 10/10
> > I don't config the mds_cache_memory_limit, I think it will be 1GB.
> > Here is the log(I set MDS log as 10)
> > ....
> > 2018-11-23 02:01:02.995067 7f40535fe700  7 src/mds/MDCache.cc:6517
> > mds.0.cache trim bytes_used=1GB limit=1GB reservation=0.05% count=0
> > 2018-11-23 02:01:02.995084 7f40535fe700  7 src/mds/MDCache.cc:6456
> > mds.0.cache trim_lru trimming 0 items from LRU size=694884 mid=472796
> > pintail=4942 pinned=19460
> > 2018-11-23 02:01:02.995409 7f404e7fb700 10 src/mds/MDSContext.cc:51
> > MDSIOContextBase::complete: 21C_IO_Dir_OMAP_Fetched
> > 2018-11-23 02:01:04.207192 7f40535fe700  7 src/mds/MDCache.cc:6502
> > mds.0.cache trim_lru trimmed 262781 items
> > 2018-11-23 02:01:04.207211 7f40535fe700 10 src/mds/MDCache.cc:7426
> > mds.0.cache trim_client_leases
> > ....
> >
> > On Mon, Nov 26, 2018 at 10:43 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Nov 26, 2018 at 7:00 AM Marvin Zhang <fanzier@xxxxxxxxx> wrote:
> > > >
> > > > Hi CephFs Experts,
> > > > I found that MDCache::trim_lru() will take more than 2s sometime, is
> > > > it as expected? And it seems like to hold mds_lock at that time, which
> > > > cause other client request can't be processed.
> > > > Thanks,
> > > > Marvin
> > >
> > > It's definitely not supposed to, but we've started noticing some
> > > performance issues with extremely large MDS cache sizes. How much
> > > memory did you give the cache, and what is your system configuration?