We had a report that after "some runtime" (days) the time to umount some file system became "huge" (minutes). Even though the file system was nearly "empty" (few thousand files). While umount after only some "short" run time was sub-second. Analysis and workaround (for this use case) follow. The file system in question is mostly used as a "spool" area, so basically lots of echo $stuff > $tempfile process $tempfile rm $tempfile Investigation shows that this creates huge amounts of negative entries in the dentry cache. There is no memory pressure, the directory is not removed either, so they stay around. Reproducer in shell: while true; do F=$RANDOM touch $F rm $F done and then watch 'cat /proc/sys/fs/dentry-state ; slabtop -o | grep dentry ; grep ^SReclaimable /proc/meminfo' (Obviously in C, perl or python, you can get orders of magnitutes higher iterations per second). So this accumulates unused negative dentries quickly, and after some time, given enough RAM, we have gigabytes worth of dentry cache, but no inodes used. Umount of that empty spool file system takes 30 seconds. It will take minutes, if you let it run even longer. In real-life, after days of real load, umounting the spool file system (with ~30 GB of accumulated dentry cache, but only a few thousand remaining inodes) took minutes, and produced soft lockups "BUG: soft lockup - CPU... stuck for XYZ seconds". The Workaround: --------------- The interesting part is, this (almost) same reproducer behaves completely different: while true; do F=$RANDOM touch $F rm $F <$F #### mind the redirection #### done (unlink before last close) This does *not* accumulate negative dentries at all. Which is how I'd expected the other case to behave as well. If we look at vfs_unlink: there is a d_delete(dentry) in there. Total dentries vs seconds runtime (with a python reproducer). Upper, linearly increasing dots are "open;close;unlink". Mind the log scale on the "total dentries" y-axis. Flat dots are for "open;unlink;close". ++-----+-------+------+------+-------+------+------+-------+-----++ ++ + + + + + .................... ++ ++ ................ ++ ++ ........... ++ 1e+07 +++ ........ +++ ++ ..... ++ ++ ... ++ ++... ++ +.. ++ 1e+06 +.+ +++ .. ++ .+ ++ ............................................................... ++ .+ ++ 100000 +++ +++ ++ ++ ++ ++ ++ ++ ++ + + + + + + + + ++ 10000 +-+----+-------+------+------+-------+------+------+-------+----+-+ 0 100 200 300 400 500 600 700 800 900 time umount after 15 minutes and 50 million iterations, so about 50 million dentries: 30 seconds. time umount after 15 minutes and same number of iterations, but no "stale negative dentries": 60 *milli* seconds. So what I suggest is to fix the "touch $F; rm $F" case to have the same effect as the "touch $F; rm $F <$F" case: drop the corresponding dentry. Would some VFS people kindly look into this, or explain why, in case it "works as designed", respectively point out what I am missing? Pretty please? :-) Also, obviously anyone can produce negative dentries by just stat()ing non-existing file names. This comes up repeatedly (I found posts going back ~15 years and more) in various contexts. Once the dentry_cache grows beyond what is a reasonable working set size for the cache levels, inserting new (negative) dentries costs about as much as it was supposed to save... I think it would be useful to introduce different sets of tunables for positive and negative dentry "expiry", or a limit on the number of negative dentries (maybe as relative vs total), to avoid such massive (several gigabytes of slab) buildup of dentry caches on a mostly empty file system, without the indirection via global (or cgroup) memory pressure. Maybe instead of always allocating and inserting a new dentry, prefer to recycle some existing negative dentry, which has not been looked at since a long time. Thanks, Lars -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html