In those 91555 dentry, these state are all 1073741824 (STATE_AUTH), no STATE_BOTTOMLRU flag on them. So I think they are all coming from lru when cache_toofull() is true, rather than from bottom_lru. When I set mds_cache_memory_limit as 30GB, the issue is gone. On Tue, Nov 27, 2018 at 7:07 PM Marvin Zhang <fanzier@xxxxxxxxx> wrote: > > It's very easy reproduce this issue in my cluster. With single MDS and > single client , I prepare 1 million files in a folder. Then I read the > files one by one. > Here is the log with log level20: > 2018-11-26 17:46:02.215734 7f40535fe700 7 src/mds/MDCache.cc:6456 > mds.0.cache trim_lru trimming 0 items from LRU size=510845 mid=322756 > pintail=4892 pinned=49765 > ...... > 2018-11-26 17:46:02.215745 7f40535fe700 12 src/mds/MDCache.cc:6644 > mds.0.cache trim_dentry [dentry #0x1/tmp/000879266.txt [2,head] auth > (dversion lock) pv=0 v=116355 inode=0x7f401b2fc400 state=1073741824 > 0x7f4012c3e600] > 2018-11-26 17:46:02.215758 7f40535fe700 12 src/mds/MDCache.cc:6653 > mds.0.cache in container [dir 0x1 / [2,head] auth v=4401399 > cv=4173802/4173802 REP dir_auth=0 state=1610612738|complete f(v0 > m2018-11-26 11:19:33.739629 6=1+5) n(v437 rc2018-11-26 11:25:19.121540 > b392072138355 1833246=1833191+55) hs=6+0,ss=0+0 dirty=1 | child=1 > subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=0 0x7f404bc4d400] > 2018-11-26 17:46:02.215775 7f40535fe700 15 src/mds/MDCache.cc:6776 > mds.0.cache trim_inode [inode 0x100000d6aa8 [2,head] > /tmp/000879266.txt auth v115474 s=65536 n(v0 b65536 1=1+0) (iversion > lock) 0x7f401b2fc400] > 2018-11-26 17:46:02.215784 7f40535fe700 12 src/mds/CDir.cc:583 > mds.0.cache.dir(0x10000000000.101110011*) unlink_inode [dentry > #0x1/tmp/000879266.txt [2,head] auth (dversion lock) pv=0 v=116355 > inode=0x7f401b2fc400 state=1073741824 0x7f4012c3e600] [inode > 0x100000d6aa8 [2,head] /tmp/000879266.txt auth v115474 s=65536 n(v0 > b65536 1=1+0) (iversion lock) 0x7f401b2fc400] > 2018-11-26 17:46:02.215792 7f40535fe700 14 src/mds/MDCache.cc:291 > mds.0.cache remove_inode [inode 0x100000d6aa8 [2,head] #100000d6aa8 > auth v115474 s=65536 n(v0 b65536 1=1+0) (iversion lock) > 0x7f401b2fc400] > 2018-11-26 17:46:02.215807 7f40535fe700 12 src/mds/CDir.cc:456 > mds.0.cache.dir(0x10000000000.101110011*) remove_dentry [dentry > #0x1/tmp/000879266.txt [2,head] auth NULL (dversion lock) pv=0 > v=116355 inode=0 state=1073741824 0x7f4012c3e600] > ...... > [marvin] > trim_dentry runs about 91555 times, it take about 2ms each time each > time in average ,they consume about 3s totally. > ...... > [marvin] The two trim_dentry below duration is the biggest, the > duration is about 30ms. > 2018-11-26 17:46:02.369953 7f40535fe700 12 src/mds/MDCache.cc:6644 > mds.0.cache trim_dentry [dentry #0x1/tmp/000443283.txt [2,head] auth > (dversion lock) pv=0 v=116541 inode=0x7f401db14f00 state=1073741824 > 0x7f40198a7200] > 2018-11-26 17:46:02.369956 7f40535fe700 12 src/mds/MDCache.cc:6653 > mds.0.cache in container [dir 0x1 / [2,head] auth v=4401399 > cv=4173802/4173802 REP dir_auth=0 state=1610612738|complete f(v0 > m2018-11-26 11:19:33.739629 6=1+5) n(v437 rc2018-11-26 11:25:19.121540 > b392072138355 1833246=1833191+55) hs=6+0,ss=0+0 dirty=1 | child=1 > subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=0 0x7f404bc4d400] > 2018-11-26 17:46:02.369961 7f40535fe700 15 src/mds/MDCache.cc:6776 > mds.0.cache trim_inode [inode 0x1000006c397 [2,head] > /tmp/000443283.txt auth v101840 s=65536 n(v0 b65536 1=1+0) (iversion > lock) 0x7f401db14f00] > 2018-11-26 17:46:02.369964 7f40535fe700 12 src/mds/CDir.cc:583 > mds.0.cache.dir(0x10000000000.101110101*) unlink_inode [dentry > #0x1/tmp/000443283.txt [2,head] auth (dversion lock) pv=0 v=116541 > inode=0x7f401db14f00 state=1073741824 0x7f40198a7200] [inode > 0x1000006c397 [2,head] /tmp/000443283.txt auth v101840 s=65536 n(v0 > b65536 1=1+0) (iversion lock) 0x7f401db14f00] > 2018-11-26 17:46:02.369971 7f40535fe700 14 src/mds/MDCache.cc:291 > mds.0.cache remove_inode [inode 0x1000006c397 [2,head] #1000006c397 > auth v101840 s=65536 n(v0 b65536 1=1+0) (iversion lock) > 0x7f401db14f00] > 2018-11-26 17:46:02.369974 7f40535fe700 12 src/mds/CDir.cc:456 > mds.0.cache.dir(0x10000000000.101110101*) remove_dentry [dentry > #0x1/tmp/000443283.txt [2,head] auth NULL (dversion lock) pv=0 > v=116541 inode=0 state=1073741824 0x7f40198a7200] > 2018-11-26 17:46:02.393052 7f40535fe700 12 src/mds/MDCache.cc:6644 > mds.0.cache trim_dentry [dentry #0x1/tmp/000351925.txt [2,head] auth > (dversion lock) pv=0 v=116541 inode=0x7f401dba3f00 state=1073741824 > 0x7f401991d200] > ...... > 2018-11-26 17:46:05.612477 7f40535fe700 7 src/mds/MDCache.cc:6502 > mds.0.cache trim_lru trimmed 91555 items > ...... > 2018-11-26 17:46:05.612722 7f40535fe700 2 src/mds/MDCache.cc:7460 > mds.0.cache check_memory_usage total 10990068, rss 1629564, heap > 10930576, baseline 559504, buffers 0, 20067 / 419292 inodes have caps, > 20117 caps, 0.0479785 caps per inode > 2018-11-26 17:46:05.612744 7f40535fe700 10 src/mds/MDLog.cc:589 > mds.0.log trim 132 / 128 segments, 96963 / -1 events, 0 (0) expiring, > 3 (2200) expired > 2018-11-26 17:46:05.612749 7f40535fe700 5 src/mds/MDLog.cc:643 > mds.0.log trim already expired segment 4864202/9082625714, 1024 events > 2018-11-26 17:46:05.612752 7f40535fe700 5 src/mds/MDLog.cc:643 > mds.0.log trim already expired segment 4865226/9084582156, 152 events > 2018-11-26 17:46:05.612757 7f40535fe700 5 src/mds/MDLog.cc:643 > mds.0.log trim already expired segment 4865378/9084873509, 1024 events > On Tue, Nov 27, 2018 at 12:28 AM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > > > > Okay, that sounds like something else is happening then. You might try > > turning up the debug to 20 and seeing what is happening during those 4 > > seconds, as that's definitely more time than it should take. > > -Greg > > > > > > On Mon, Nov 26, 2018 at 10:37 AM Marvin Zhang <fanzier@xxxxxxxxx> wrote: > > > > > > I tested it in 12.2.8, and the debug_mds log level is 10/10 > > > I don't config the mds_cache_memory_limit, I think it will be 1GB. > > > Here is the log(I set MDS log as 10) > > > .... > > > 2018-11-23 02:01:02.995067 7f40535fe700 7 src/mds/MDCache.cc:6517 > > > mds.0.cache trim bytes_used=1GB limit=1GB reservation=0.05% count=0 > > > 2018-11-23 02:01:02.995084 7f40535fe700 7 src/mds/MDCache.cc:6456 > > > mds.0.cache trim_lru trimming 0 items from LRU size=694884 mid=472796 > > > pintail=4942 pinned=19460 > > > 2018-11-23 02:01:02.995409 7f404e7fb700 10 src/mds/MDSContext.cc:51 > > > MDSIOContextBase::complete: 21C_IO_Dir_OMAP_Fetched > > > 2018-11-23 02:01:04.207192 7f40535fe700 7 src/mds/MDCache.cc:6502 > > > mds.0.cache trim_lru trimmed 262781 items > > > 2018-11-23 02:01:04.207211 7f40535fe700 10 src/mds/MDCache.cc:7426 > > > mds.0.cache trim_client_leases > > > .... > > > > > > On Mon, Nov 26, 2018 at 10:43 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > > > > > > > > On Mon, Nov 26, 2018 at 7:00 AM Marvin Zhang <fanzier@xxxxxxxxx> wrote: > > > > > > > > > > Hi CephFs Experts, > > > > > I found that MDCache::trim_lru() will take more than 2s sometime, is > > > > > it as expected? And it seems like to hold mds_lock at that time, which > > > > > cause other client request can't be processed. > > > > > Thanks, > > > > > Marvin > > > > > > > > It's definitely not supposed to, but we've started noticing some > > > > performance issues with extremely large MDS cache sizes. How much > > > > memory did you give the cache, and what is your system configuration?