On Thu, Feb 25, 2016 at 11:31:04AM +1100, Neil Brown wrote: > On Thu, Feb 25 2016, Shaohua Li wrote: > > > > > As for the bug, write requests run in raid5d, mddev_suspend() waits for all IO, > > which waits for the write requests. So this is a clear deadlock. I think we > > should delete the check_reshape() in md_check_recovery(). If we change > > layout/disks/chunk_size, check_reshape() is already called. If we start an > > array, the .run() already handles new layout. There is no point > > md_check_recovery() check_reshape() again. > > Are you sure? > Did you look at the commit which added that code? > commit b4c4c7b8095298ff4ce20b40bf180ada070812d0 > > When there is an IO error, reshape (or resync or recovery) will abort > and then possibly be automatically restarted. thanks pointing out this. > Without the check here a reshape might be attempted on an array which > has failed. Not sure if that would be harmful, but it would certainly > be pointless. > > But you are right that this is causing the problem. > Maybe we should keep track of the size of the 'scribble' arrays and only > call resize_chunks if the size needs to change? Similar to what > resize_stripes does. yep, this is my first solution, but think check_reshape() is useless here later, apparently miss the restart case. I'll go this way. > It might also be good to put something like > WARN_ON(current == mddev->thread->task); > in mddev_suspend() ... or whatever code would cause this sort of error > to trigger a warning early. Sounds good. Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html