On Wed, Mar 06, 2019 at 12:06:42PM +0100, Lukas Czerner wrote: > Ext4 needs to serialize unaligned direct AIO because the zeroing of > partial blocks of two competing unaligned AIOs can result in data > corruption. > > However it decides not to serialize if the potentially unaligned aio is > past i_size with the rationale that no pending writes are possible past > i_size. Unfortunately if the i_size is not block aligned and the second > unaligned write lands past i_size, but still into the same block, it has > the potential of corrupting the previous unaligned write to the same > block. > > This is (very simplified) reproducer from Frank > > // 41472 = (10 * 4096) + 512 > // 37376 = 41472 - 4096 > > ftruncate(fd, 41472); > io_prep_pwrite(iocbs[0], fd, buf[0], 4096, 37376); > io_prep_pwrite(iocbs[1], fd, buf[1], 4096, 41472); > > io_submit(io_ctx, 1, &iocbs[1]); > io_submit(io_ctx, 1, &iocbs[2]); > > io_getevents(io_ctx, 2, 2, events, NULL); > > Without this patch the 512B range from 40960 up to the start of the > second unaligned write (41472) is going to be zeroed overwriting the data > written by the first write. This is a data corruption. > > 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > * > 00009200 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 > * > 0000a000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > * > 0000a200 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 > > With this patch the data corruption is avoided because we will recognize > the unaligned_aio and wait for the unwritten extent conversion. > > 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > * > 00009200 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 > * > 0000a200 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 > * > 0000b200 > > Reported-by: Frank Sorenson <fsorenso@xxxxxxxxxx> > Signed-off-by: Lukas Czerner <lczerner@xxxxxxxxxx> > Fixes: e9e3bcecf44c ("ext4: serialize unaligned asynchronous DIO") > Cc: <stable@xxxxxxxxxxxxxxx> Thanks, applied. - Ted