On Mon, May 22, 2017 at 11:18 AM, Khazhismel Kumykov <khazhy@xxxxxxxxxx> wrote: > On Wed, May 17, 2017 at 2:58 PM Khazhismel Kumykov <khazhy@xxxxxxxxxx> wrote: >> >> On Mon, May 15, 2017 at 5:05 PM, Khazhismel Kumykov <khazhy@xxxxxxxxxx> wrote: >> > Hi, >> > >> > I'm seeing behavior in d_invalidate, if multiple threads call d_invalidate on >> > the same tree at the same, behavior time blows up and all the calls hang with >> > large enough trees/enough simultaneous callers. (e.g. a directory w/ 100k >> > entries in d_subdir, and 5 or so threads calling d_invalidate was able to hang >> > my test VMs) >> > >> > This seems to be due to thrashing on the dentry locks in multiple threads >> > tightly looping calling d_walk waiting for a shrink_dentry_list call (also >> > thrashing the locks) to complete. (d_invalidate loops calling d_walk so long as >> > any dentry in the tree is in a dcache shrink list). >> > >> > There was a similar report recently "soft lockup in d_invalidate()" that >> > proposed in the d_invalidate d_walk to ignore dentries marked as in a shrink >> > list already, which does prevent this hang/lockup in this case as well, although >> > I'm not sure it doesn't violate the contract for d_invalidate. (Although the >> > entries in a shrink list should be dropped eventually, not necessarily by the >> > time d_invalidate returns). >> > >> > If someone more familiar with the dcache could provide input I would appreciate. >> > >> > A reliable repro on mainline is: >> > - create a mountpoint with DCACHE_OP_REVALIDATE, e.g. fuse passthrough >> > - create a directory and populate with ~100k files with content to >> > populate dcache >> > - create some background processes opening/reading files in this folder (5 >> > while true; cat $file was enough to get an indefinite hang for me) >> > - cause the directory to need to be invalidated (e.g., make_bad_inode on the >> > directory) >> > >> > This results in the background processes all entering d_invalidate and hanging, >> > while with just one process in d_invalidate (e.g., stat'ing a file in the dir) >> > things go pretty quickly as expected. >> > >> > >> > (The proposed patch from other thread) >> > >> > diff --git a/fs/dcache.c b/fs/dcache.c >> > index 7b8feb6..3a3b0f37 100644 >> > --- a/fs/dcache.c >> > +++ b/fs/dcache.c >> > @@ -1364,7 +1364,7 @@ static enum d_walk_ret select_collect(void *_data, >> > struct dentry *dentry) >> > goto out; >> > >> > if (dentry->d_flags & DCACHE_SHRINK_LIST) { >> > - data->found++; >> > + goto out; >> > } else { >> > if (dentry->d_flags & DCACHE_LRU_LIST) >> > d_lru_del(dentry); >> > >> > >> > khazhy >> >> Would this change actually violate any guarantees? select_collect >> looks like it used to ignore unused dentries that were part of a >> shrink list before fe91522a7ba82ca1a51b07e19954b3825e4aaa22 (found >> would not be incremented). Once the dentry is on a shrink list would >> it be unreachable anyways, so returning from d_invalidate before all >> the shrink lists finish processing would be ok? >> >> khazhy > > Pinging in case this got missed, would appreciate thoughts from someone more > familiar with the dcache. > > My previous email wasn't completely correct, while before fe91522a7ba8 > ("don't remove from shrink list in select_collect()") d_invalidate > would not busy > wait for other workers calling shrink_list to compete, it would return -EBUSY, > rather than success, so a change to return success without waiting would not > be equivalent behavior before. Currently, we will loop calling d_walk repeatedly > causing the extreme slowdown observed. > > I still want to understand better, in d_invalidate if we can return > without pruning > in-use dentries safely, would returning before unused dentries are > pruned, so long > as we know they will be pruned by another task (are on a shrink list), > be safe as well? > If not, would it make more sense to have have a mutual exclusion on calling > d_invalidate on the same dentries simultaneously? > > khazhy ping Again to summarize: With a directory with large number of children, which makes a dentry with a large number dentries in d_subdir, simultaneous calls to d_invalidate on that dentry take a very long time. As an example, a directory with 160k files, where a single d_invalidate call took ~1.4s, having 6 tasks call d_invalidate simultaneously took ~6.5 hours to resolve, about 10000x longer. dir/ dir/file-0 ... dir/file-160000 This seems to be due to contention between d_walk and shrink_dentry_list, and particularly d_invalidate tightly looping d_walk so long as any dentry it finds is in a shrink list. With this particular directory structure, d_walk will hold d_lock for "dir" while walking over d_subdir, meanwhile shrink_dentry_list will release and regrab "dir"s d_lock every iteration, also throwing it at the back of the queue. One proposed solution is in select_collect, do not increment the number of found unused descendants when we find a dentry in a shrink list. This proposal will result in d_invalidate and shrink_dcache_parent returning before necessarily all unused children are pruned, but they will be pruned at some time soon by the other task currently shrinking, which I am not sure if that is OK or not. A quick and dirty fix would be adding a mutex around the shrink loops, so simultaneous tasks don't thrash with each other. There are probably also other solutions here, I would appreciate a look from someone more familiar with this area. khazhy
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature