dcache endless loop in d_invalidate

Martin Schwidefsky <schwidefsky@xxxxxxxxxx> · Tue, 16 Oct 2018 13:15:28 +0200

Hi Al,

I am currently looking into a customer dump and found what looks like
an issue in the dcache code. And I think the following commit of yours
has something to do with it:

commit fe91522a7ba82ca1a51b07e19954b3825e4aaa22
Author: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Date:   Sat May 3 00:02:25 2014 -0400

    don't remove from shrink list in select_collect()

            If we find something already on a shrink list, just increment
    data->found and do nothing else.  Loops in shrink_dcache_parent() and
    check_submounts_and_drop() will do the right thing - everything we
    did put into our list will be evicted and if there had been nothing,
    but data->found got non-zero, well, we have somebody else shrinking
    those guys; just try again.

    Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>

The dump I got is based on kernel v4.4 but the affected dcache functions
look identical to the upstream version. Here is what I found in the dump:

A lot of "rcu_sched kthread starved for <xxx> jiffies!" messages
Only one CPU, currently running process "run-crons" task 0x65a8008
It just called check_and_drop from d_walk, full backchain:

    PSW.addr   check_and_drop at 30a0e8
    %r14       d_walk at 308202
 #0 [35b87b88] d_invalidate at 3096e8
 #1 [35b87bd8] proc_flush_task at 37190c
 #2 [35b87c58] release_task at 13f202
 #3 [35b87cc8] wait_task_zombie at 13fc36
 #4 [35b87d50] wait_consider_task at 140150
 #5 [35b87dc0] do_wait at 1403de
 #6 [35b87e18] sys_wait4 at 14181e
 #7 [35b87ea8] system_call at 659ec4

Tasks runtime is
  sum_exec_runtime 26813717162347 # nsec = 26813 seconds,
  utime = 3991252 # cputime = 974 seconds,
  stime = 99132516783832 # cputime = 24202 seconds,
Task 0x65a8008 has TIF_NEED_RESCHED set

d_walk() just called check_and_drop via the finish() function pointer,
check_and_drop() will return and d_walk() will return as well.
Look like an endless loop in d_invalidate().

The (struct dentry *) dentry in d_invalidate() is at 0x3cb15858
The struct detach_data data in d_invalidate() is at 0x35b87c28

dentry tree starting @ 0x3cb15858 has two entries in d_subdirs:
0x3cb15858  d_name.name: "11898"
        0xb940d3d8 d_name.name: "cmdline"
        0xb940dd98 d_name.name: "status"

crash> px *(struct dentry *) 0x3cb15858 | grep d_flags
  d_flags = 0x2000cc,

crash> px *(struct dentry *) 0xb940d3d8 | grep d_flags
  d_flags = 0x48048c,  # DCACHE_SHRINK_LIST is set

crash> px *(struct dentry *) 0xb940dd98 | grep d_flags
  d_flags = 0x48048c,  # DCACHE_SHRINK_LIST is set

crash> px *(struct detach_data *) 0x35b87c28
$29 = {
  select = {
    start = 0x3cb15858,
    dispose = {
      next = 0x35b87c30,
      prev = 0x35b87c30
    },
    found = 0x2
  },
  mountpoint = 0x0
}

select_collect() called from detach_and_collect() will increment
data.select.found in the struct detach_data @ 0x35b87c28 but will not
add any dentries to the dispose lists. The shrink_dentry_list() call in
d_invalidate() will do nothing as the dispose list is empty. The two
dentries 0xb940d3d8 and 0xb940dd98 are still there. After d_walk returns
d_invalidate() finds data.mountpoint == NULL and data.select.found == 2,
it will start the loop again without progress.

As this is a single CPU system without kernel preemption there is nobody
else that will do the shrinking of those dcache entries.

In short, this if-statement in select_collect:

        if (dentry->d_flags & DCACHE_SHRINK_LIST) {
                data->found++;
        }

with assumption that "somebody else" will do the shrinking seems broken.

Do you agree?

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.