ext4 data corruption in 6.1 stable tree (was Re: [PATCH 5.15 000/297] 5.15.140-rc1 review)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello!

On Mon 27-11-23 11:32:12, Daniel Díaz wrote:
> On Mon, 27 Nov 2023 at 09:56, Jan Kara <jack@xxxxxxx> wrote:
> > On Fri 24-11-23 23:45:09, Daniel Díaz wrote:
> > > On 24/11/23 11:50 a. m., Greg Kroah-Hartman wrote:
> > > > This is the start of the stable review cycle for the 5.15.140 release.
> > > > There are 297 patches in this series, all will be posted as a response
> > > > to this one.  If anyone has any issues with these being applied, please
> > > > let me know.
> > > >
> > > > Responses should be made by Sun, 26 Nov 2023 17:19:17 +0000.
> > > > Anything received after that time might be too late.
> > > >
> > > > The whole patch series can be found in one patch at:
> > > >     https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.140-rc1.gz
> > > > or in the git tree and branch at:
> > > >     git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> > > > and the diffstat can be found below.
> > > >
> > > > thanks,
> > > >
> > > > greg k-h
> > >
> > > We are noticing a regression with ltp-syscalls' preadv03:
> >
> > Thanks for report!
> >
> > > -----8<-----
> > >   preadv03 preadv03
> > >   preadv03_64 preadv03_64
> > >   preadv03.c:102: TINFO: Using block size 512
> > >   preadv03.c:87: TPASS: preadv(O_DIRECT) read 512 bytes successfully with content 'a' expectedly
> > >   preadv03.c:87: TPASS: preadv(O_DIRECT) read 512 bytes successfully with content 'a' expectedly
> > >   preadv03.c:87: TPASS: preadv(O_DIRECT) read 512 bytes successfully with content 'b' expectedly
> > >   preadv03.c:102: TINFO: Using block size 512
> > >   preadv03.c:77: TFAIL: Buffer wrong at 0 have 62 expected 61
> > >   preadv03.c:77: TFAIL: Buffer wrong at 0 have 62 expected 61
> > >   preadv03.c:66: TFAIL: preadv(O_DIRECT) read 0 bytes, expected 512
> > >   preadv03.c:102: TINFO: Using block size 512
> > >   preadv03.c:77: TFAIL: Buffer wrong at 0 have 62 expected 61
> > >   preadv03.c:77: TFAIL: Buffer wrong at 0 have 62 expected 61
> > >   preadv03.c:66: TFAIL: preadv(O_DIRECT) read 0 bytes, expected 512
> > >   preadv03.c:102: TINFO: Using block size 512
> > >   preadv03.c:87: TPASS: preadv(O_DIRECT) read 512 bytes successfully with content 'a' expectedly
> > >   preadv03.c:87: TPASS: preadv(O_DIRECT) read 512 bytes successfully with content 'a' expectedly
> > >   preadv03.c:87: TPASS: preadv(O_DIRECT) read 512 bytes successfully with content 'b' expectedly
> > >   preadv03.c:102: TINFO: Using block size 512
> > >   preadv03.c:77: TFAIL: Buffer wrong at 0 have 62 expected 61
> > >   preadv03.c:77: TFAIL: Buffer wrong at 0 have 62 expected 61
> > >   preadv03.c:66: TFAIL: preadv(O_DIRECT) read 0 bytes, expected 512
> > >   preadv03.c:102: TINFO: Using block size 512
> > >   preadv03.c:77: TFAIL: Buffer wrong at 0 have 62 expected 61
> > >   preadv03.c:77: TFAIL: Buffer wrong at 0 have 62 expected 61
> > >   preadv03.c:66: TFAIL: preadv(O_DIRECT) read 0 bytes, expected 512
> > > ----->8-----
> > >
> > > This is seen in the following environments:
> > > * dragonboard-845c
> > > * juno-64k_page_size
> > > * qemu-arm64
> > > * qemu-armv7
> > > * qemu-i386
> > > * qemu-x86_64
> > > * x86_64-clang
> > >
> > > and on the following RC's:
> > > * v5.10.202-rc1
> > > * v5.15.140-rc1
> > > * v6.1.64-rc1
> >
> > Hum, even in 6.1? That's odd. Can you please test whether current upstream
> > vanilla kernel works for you with this test? Thanks!
> 
> Yes, this is working for us on mainline and next:
>   https://qa-reports.linaro.org/lkft/linux-mainline-master/tests/ltp-syscalls/preadv03
>   https://qa-reports.linaro.org/lkft/linux-next-master/tests/ltp-syscalls/preadv03
> c.fr. 6.1:
>   https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.1.y/tests/ltp-syscalls/preadv03
> 
> Greetings!

So I've got back to this and the failure is a subtle interaction between
iomap code and ext4 code. In particular that fact that commit 936e114a245b6
("iomap: update ki_pos a little later in iomap_dio_complete") is not in
stable causes that file position is not updated after direct IO write and
thus we direct IO writes are ending in wrong locations effectively
corrupting data. The subtle detail is that before this commit if ->end_io
handler returns non-zero value (which the new ext4 ->end_io handler does),
file pos doesn't get updated, after this commit it doesn't get updated only
if the return value is < 0.

The commit got merged in 6.5-rc1 so all stable kernels that have
91562895f803 ("ext4: properly sync file size update after O_SYNC direct
IO") before 6.5 are corrupting data - I've noticed at least 6.1 is still
carrying the problematic commit. Greg, please take out the commit from all
stable kernels before 6.5 as soon as possible, we'll figure out proper
backport once user data are not being corrupted anymore. Thanks!

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux