Re: [BUG] MD/RAID1 hung forever on freeze_array

NeilBrown <neilb@xxxxxxxx> · Fri, 09 Dec 2016 17:01:51 +1100

On Thu, Dec 08 2016, Jinpu Wang wrote:

This number:

>   nr_pending = {
>     counter = 1
>   },

and this number:

>   nr_pending = {
>     counter = 856
>   },

might be interesting.

There are 855 requested on the list.  Add the one that is currently
being retried give 856, which is nr_pending for the device that failed.
But nr_pending on the device that didn't fail is 1.  I would expect
zero.
When a read or write requests succeeds, rdev_dec_pending() is called
immediately so this should quickly go to zero.

It seems as though there must be a request to the loop device that is
stuck somewhere between the atomic_inc(&rdev->nr_pending) (possibly
inside read_balance) and the call to generic_make_request().
I cannot yet see how that would happen.

Can you check if the is a repeatable observation?  Is nr_pending.counter
always '1' on the loop device?

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature