Re: CephFS trim_lru performance issue

Marvin Zhang <fanzier@xxxxxxxxx> · Tue, 27 Nov 2018 19:53:03 +0800



In those 91555 dentry, these state are all 1073741824 (STATE_AUTH), no
STATE_BOTTOMLRU flag on them.
So I think they are all coming from lru when cache_toofull() is true,
rather than from bottom_lru.
When I set  mds_cache_memory_limit as 30GB, the issue is gone.
On Tue, Nov 27, 2018 at 7:07 PM Marvin Zhang <fanzier@xxxxxxxxx> wrote:
>
> It's very easy reproduce this issue in my cluster. With single MDS and
> single client , I prepare 1 million files in a folder. Then I read the
> files one by one.
> Here is the log with log level20:
> 2018-11-26 17:46:02.215734 7f40535fe700  7 src/mds/MDCache.cc:6456
> mds.0.cache trim_lru trimming 0 items from LRU size=510845 mid=322756
> pintail=4892 pinned=49765
> ......
> 2018-11-26 17:46:02.215745 7f40535fe700 12 src/mds/MDCache.cc:6644
> mds.0.cache trim_dentry [dentry #0x1/tmp/000879266.txt [2,head] auth
> (dversion lock) pv=0 v=116355 inode=0x7f401b2fc400 state=1073741824
> 0x7f4012c3e600]
> 2018-11-26 17:46:02.215758 7f40535fe700 12 src/mds/MDCache.cc:6653
> mds.0.cache  in container [dir 0x1 / [2,head] auth v=4401399
> cv=4173802/4173802 REP dir_auth=0 state=1610612738|complete f(v0
> m2018-11-26 11:19:33.739629 6=1+5) n(v437 rc2018-11-26 11:25:19.121540
> b392072138355 1833246=1833191+55) hs=6+0,ss=0+0 dirty=1 | child=1
> subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=0 0x7f404bc4d400]
> 2018-11-26 17:46:02.215775 7f40535fe700 15 src/mds/MDCache.cc:6776
> mds.0.cache trim_inode [inode 0x100000d6aa8 [2,head]
> /tmp/000879266.txt auth v115474 s=65536 n(v0 b65536 1=1+0) (iversion
> lock) 0x7f401b2fc400]
> 2018-11-26 17:46:02.215784 7f40535fe700 12 src/mds/CDir.cc:583
> mds.0.cache.dir(0x10000000000.101110011*) unlink_inode [dentry
> #0x1/tmp/000879266.txt [2,head] auth (dversion lock) pv=0 v=116355
> inode=0x7f401b2fc400 state=1073741824 0x7f4012c3e600] [inode
> 0x100000d6aa8 [2,head] /tmp/000879266.txt auth v115474 s=65536 n(v0
> b65536 1=1+0) (iversion lock) 0x7f401b2fc400]
> 2018-11-26 17:46:02.215792 7f40535fe700 14 src/mds/MDCache.cc:291
> mds.0.cache remove_inode [inode 0x100000d6aa8 [2,head] #100000d6aa8
> auth v115474 s=65536 n(v0 b65536 1=1+0) (iversion lock)
> 0x7f401b2fc400]
> 2018-11-26 17:46:02.215807 7f40535fe700 12 src/mds/CDir.cc:456
> mds.0.cache.dir(0x10000000000.101110011*) remove_dentry [dentry
> #0x1/tmp/000879266.txt [2,head] auth NULL (dversion lock) pv=0
> v=116355 inode=0 state=1073741824 0x7f4012c3e600]
> ......
> [marvin]
> trim_dentry runs about 91555 times, it take about 2ms each time each
> time in average ,they consume about 3s totally.
> ......
> [marvin] The two trim_dentry below duration is the biggest, the
> duration is about 30ms.
> 2018-11-26 17:46:02.369953 7f40535fe700 12 src/mds/MDCache.cc:6644
> mds.0.cache trim_dentry [dentry #0x1/tmp/000443283.txt [2,head] auth
> (dversion lock) pv=0 v=116541 inode=0x7f401db14f00 state=1073741824
> 0x7f40198a7200]
> 2018-11-26 17:46:02.369956 7f40535fe700 12 src/mds/MDCache.cc:6653
> mds.0.cache  in container [dir 0x1 / [2,head] auth v=4401399
> cv=4173802/4173802 REP dir_auth=0 state=1610612738|complete f(v0
> m2018-11-26 11:19:33.739629 6=1+5) n(v437 rc2018-11-26 11:25:19.121540
> b392072138355 1833246=1833191+55) hs=6+0,ss=0+0 dirty=1 | child=1
> subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=0 0x7f404bc4d400]
> 2018-11-26 17:46:02.369961 7f40535fe700 15 src/mds/MDCache.cc:6776
> mds.0.cache trim_inode [inode 0x1000006c397 [2,head]
> /tmp/000443283.txt auth v101840 s=65536 n(v0 b65536 1=1+0) (iversion
> lock) 0x7f401db14f00]
> 2018-11-26 17:46:02.369964 7f40535fe700 12 src/mds/CDir.cc:583
> mds.0.cache.dir(0x10000000000.101110101*) unlink_inode [dentry
> #0x1/tmp/000443283.txt [2,head] auth (dversion lock) pv=0 v=116541
> inode=0x7f401db14f00 state=1073741824 0x7f40198a7200] [inode
> 0x1000006c397 [2,head] /tmp/000443283.txt auth v101840 s=65536 n(v0
> b65536 1=1+0) (iversion lock) 0x7f401db14f00]
> 2018-11-26 17:46:02.369971 7f40535fe700 14 src/mds/MDCache.cc:291
> mds.0.cache remove_inode [inode 0x1000006c397 [2,head] #1000006c397
> auth v101840 s=65536 n(v0 b65536 1=1+0) (iversion lock)
> 0x7f401db14f00]
> 2018-11-26 17:46:02.369974 7f40535fe700 12 src/mds/CDir.cc:456
> mds.0.cache.dir(0x10000000000.101110101*) remove_dentry [dentry
> #0x1/tmp/000443283.txt [2,head] auth NULL (dversion lock) pv=0
> v=116541 inode=0 state=1073741824 0x7f40198a7200]
> 2018-11-26 17:46:02.393052 7f40535fe700 12 src/mds/MDCache.cc:6644
> mds.0.cache trim_dentry [dentry #0x1/tmp/000351925.txt [2,head] auth
> (dversion lock) pv=0 v=116541 inode=0x7f401dba3f00 state=1073741824
> 0x7f401991d200]
> ......
> 2018-11-26 17:46:05.612477 7f40535fe700  7 src/mds/MDCache.cc:6502
> mds.0.cache trim_lru trimmed 91555 items
> ......
> 2018-11-26 17:46:05.612722 7f40535fe700  2 src/mds/MDCache.cc:7460
> mds.0.cache check_memory_usage total 10990068, rss 1629564, heap
> 10930576, baseline 559504, buffers 0, 20067 / 419292 inodes have caps,
> 20117 caps, 0.0479785 caps per inode
> 2018-11-26 17:46:05.612744 7f40535fe700 10 src/mds/MDLog.cc:589
> mds.0.log trim 132 / 128 segments, 96963 / -1 events, 0 (0) expiring,
> 3 (2200) expired
> 2018-11-26 17:46:05.612749 7f40535fe700  5 src/mds/MDLog.cc:643
> mds.0.log trim already expired segment 4864202/9082625714, 1024 events
> 2018-11-26 17:46:05.612752 7f40535fe700  5 src/mds/MDLog.cc:643
> mds.0.log trim already expired segment 4865226/9084582156, 152 events
> 2018-11-26 17:46:05.612757 7f40535fe700  5 src/mds/MDLog.cc:643
> mds.0.log trim already expired segment 4865378/9084873509, 1024 events
> On Tue, Nov 27, 2018 at 12:28 AM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> >
> > Okay, that sounds like something else is happening then. You might try
> > turning up the debug to 20 and seeing what is happening during those 4
> > seconds, as that's definitely more time than it should take.
> > -Greg
> >
> >
> > On Mon, Nov 26, 2018 at 10:37 AM Marvin Zhang <fanzier@xxxxxxxxx> wrote:
> > >
> > > I tested it in 12.2.8, and the debug_mds log level is 10/10
> > > I don't config the mds_cache_memory_limit, I think it will be 1GB.
> > > Here is the log(I set MDS log as 10)
> > > ....
> > > 2018-11-23 02:01:02.995067 7f40535fe700  7 src/mds/MDCache.cc:6517
> > > mds.0.cache trim bytes_used=1GB limit=1GB reservation=0.05% count=0
> > > 2018-11-23 02:01:02.995084 7f40535fe700  7 src/mds/MDCache.cc:6456
> > > mds.0.cache trim_lru trimming 0 items from LRU size=694884 mid=472796
> > > pintail=4942 pinned=19460
> > > 2018-11-23 02:01:02.995409 7f404e7fb700 10 src/mds/MDSContext.cc:51
> > > MDSIOContextBase::complete: 21C_IO_Dir_OMAP_Fetched
> > > 2018-11-23 02:01:04.207192 7f40535fe700  7 src/mds/MDCache.cc:6502
> > > mds.0.cache trim_lru trimmed 262781 items
> > > 2018-11-23 02:01:04.207211 7f40535fe700 10 src/mds/MDCache.cc:7426
> > > mds.0.cache trim_client_leases
> > > ....
> > >
> > > On Mon, Nov 26, 2018 at 10:43 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> > > >
> > > > On Mon, Nov 26, 2018 at 7:00 AM Marvin Zhang <fanzier@xxxxxxxxx> wrote:
> > > > >
> > > > > Hi CephFs Experts,
> > > > > I found that MDCache::trim_lru() will take more than 2s sometime, is
> > > > > it as expected? And it seems like to hold mds_lock at that time, which
> > > > > cause other client request can't be processed.
> > > > > Thanks,
> > > > > Marvin
> > > >
> > > > It's definitely not supposed to, but we've started noticing some
> > > > performance issues with extremely large MDS cache sizes. How much
> > > > memory did you give the cache, and what is your system configuration?