On 03/16/2017 04:31 PM, Dave Chinner wrote: > On Wed, Mar 15, 2017 at 04:51:04PM -0500, Goldwyn Rodrigues wrote: >> From: Goldwyn Rodrigues <rgoldwyn@xxxxxxxx> >> >> A new flag BIO_NOWAIT is introduced to identify bio's >> orignating from iocb with IOCB_NOWAIT. This flag indicates >> to return immediately if a request cannot be made instead >> of retrying. > > So this makes a congested block device run the bio IO completion > callback with an -EAGAIN error present? Are all the filesystem > direct IO submission and completion routines OK with that? i.e. does > such a congestion case cause filesystems to temporarily expose stale > data to unprivileged users when the IO is requeued in this way? > > e.g. filesystem does allocation without blocking, submits bio, > device is congested, runs IO completion with error, so nothing > written to allocated blocks, write gets queued, so other read > comes in while the write is queued, reads data from uninitialised > blocks that were allocated during the write.... > > Seems kinda problematic to me to have a undocumented design > constraint (i.e a landmine) where we submit the AIO only to have it > error out and then expect the filesystem to do something special and > different /without blocking/ on EAGAIN. If the filesystems has to perform block allocation, we would return -EAGAIN early enough. However, I agree there is a problem, since not all filesystems know this. I worked on only three of them. > > Why isn't the congestion check at a higher layer like we do for page > cache readahead? i.e. using the bdi*congested() API at the time we > are doing all the other filesystem blocking checks. > Yes, that may work better. We will have to call bdi_read_congested() on a write path. (will have to comment that part of the code). Would it encompass all possible waits in the block layer? -- Goldwyn