Re: [syzbot] INFO: task can't die in blkdev_common_ioctl

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Tue, 5 Apr 2022 22:15:47 +0900

On 2022/04/05 15:44, Christoph Hellwig wrote:
> On Mon, Apr 04, 2022 at 02:12:14PM +0900, Tetsuo Handa wrote:
>> On 2022/04/04 13:58, Christoph Hellwig wrote:
>> My patch proposes filemap_invalidate_lock_killable() and converts only
>> blkdev_fallocate() case as a starting point. Nothing prevents us from
>> converting e.g. blk_ioctl_zeroout() case as well. The "not come through
>> blkdev_fallocate" is bogus.
> 
> Sure, we could try to convert most of the > 50 instances of
> filemap_invalidate_lock to be killable.  But that:
> 
>   a) isn't what your patch actuall did

We can step by step convert all possible locations.

>   b) doesn't solve the underlying issue that is wasn't designed to to be
>      held over very extremely long running operations

Do you want to introduce state variables like Lo_rundown in order to
avoid holding this invalidate lock? That will be a lot of complication.

> 
> Or to get back to what I said before - I don't think we can just hold
> the lock over manually zeroing potentially gigabytes of blocks.
> 

Periodically releasing this invalidate lock may result in inconsistent
output. One can issue conflicting request (e.g. enlarging a file and
truncating that file). If truncate happens while partially enlarged (or
writing non-zero values while partially zeroed), resuming will need to
undo what was done during this invalidate lock was released. (Or do you
want to make conflicting requests fail with e.g. -EBUSY, possibly breaking
userspace programs?)

Serialization via holding throughout this zeroing/fallocate operation
is needed for returning consistent output.

> In other words:  we'll need to chunk the zeroing up if we want
> to hold the invalidate lock, I see no ther way to properly fix this.

I don't think that we can chunk the zeroing/fallocate up, for
I don't think that we can determine appropriate chunk size.
It might be 4KB, it might be 1MB, it might be 1GB, but who knows which
size is safe for avoiding hung task warning on this invalidate lock.

The block device might be super slow (e.g. as slow as floppy disk, or
networking block device over very slow network). Also, hung task watchdog
timeout and concurrent requests colliding on this invalidate lock are not
visible to current thread doing actual zeroing/fallocate.

All we could afford will be to make this invalidate lock killable and
sometimes check fatal_signal_pending() inside zeroing/fallocate loop.