[Bug 201331] deadlock (XFS?)

bugzilla-daemon@xxxxxxxxxxxxxxxxxxx · Fri, 05 Oct 2018 17:09:39 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=201331

Eric Sandeen (sandeen@xxxxxxxxxxx) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sandeen@xxxxxxxxxxx
           Assignee|filesystem_xfs@kernel-bugs. |io_md@xxxxxxxxxxxxxxxxxxxx
                   |kernel.org                  |

--- Comment #11 from Eric Sandeen (sandeen@xxxxxxxxxxx) ---
One thing that's kind of weird is this:

[ 1679.494859] md: md2: resync done.
[ 5679.900329] INFO: task tar:18235 blocked for more than 120 seconds.

almost exactly 4000 seconds?  Maybe a coincidence.

The messages from md's bitmap_startwrite is almost the same timestamp, too:

[ 5679.904044] INFO: task kworker/u24:3:18307 blocked for more than 120
seconds.

md is scheduled out here:

                if (unlikely(COUNTER(*bmc) == COUNTER_MAX)) {
                        DEFINE_WAIT(__wait);
                        /* note that it is safe to do the prepare_to_wait
                         * after the test as long as we do it before dropping
                         * the spinlock.
                         */
                        prepare_to_wait(&bitmap->overflow_wait, &__wait,
                                        TASK_UNINTERRUPTIBLE);
                        spin_unlock_irq(&bitmap->counts.lock);
                        schedule();
                        finish_wait(&bitmap->overflow_wait, &__wait);
                        continue;
                }

So md is waiting to be woken up when the bitmap writer finishes.  Details
aside, I really do think that xfs is the victim/messenger here; we should at
least try to get some md eyes on this one as well.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.