On Wed, May 06, 2020 at 04:08:03PM +0200, Stefan Metzmacher wrote: > Am 06.05.20 um 14:41 schrieb Anoop C S: > > On Wed, 2020-05-06 at 12:33 +0200, Stefan Metzmacher wrote: > >> Hi Anoop, > >> > >>> I could reproduce the difference in SHA256 checksum after copying a > >>> directory with 100 copies of test file(provided by reporter) from > >>> io_uring VFS module enabled share using Windows explorer(right- > >>> click- > >>>> copy/paste). Only 5 out of 100 files had correct checksum after > >>>> copy > >>> operation :-/ > >> > >> Great! Can you please try to collect level 1 log files with > >> the patch https://bugzilla.samba.org/attachment.cgi?id=15955 > >> applied? > > > > I have attached three log files. > > log.io_uring.smbd -- Copy using Windows explorer > > log.io_uring-mget.smd -- Copy using smbclient > > log.io_uring-powershell.smd -- Copy using `Copy-Item` > > Thanks! All of them show short reads like: > > > [2020/05/06 17:27:28.130248, 1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req) > > vfs_io_uring_finish_req: pread ofs=0 (0x0) len=32768 (0x8000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405 > > [2020/05/06 17:27:28.131049, 1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req) > > vfs_io_uring_finish_req: pread ofs=9969664 (0x982000) len=30336 (0x7680) nread=30336 (0x30336) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405 > > [2020/05/06 17:27:28.133679, 1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req) > > vfs_io_uring_finish_req: pread ofs=61440 (0xf000) len=32768 (0x8000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405 > > [2020/05/06 17:27:28.140184, 0] ../../source3/modules/vfs_io_uring.c:88(vfs_io_uring_finish_req) > > vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=1048576 (0x100000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405 > > It seems the first request is at ofs=0 len=32768 (0x8000) and we get > 32768 back. > A follow up request with ofs=0 len=1048576 (0x100000) only gets the > first 32768 bytes which are already in the page cache. > > I can easily reproduce this with the Ubuntu 5.4 kernel once I add > state->ur.sqe.rw_flags |= RWF_NOWAIT; to vfs_io_uring_pread_send() > and use this. > > echo 1 > /proc/sys/vm/drop_caches > head -c 1024 /root/samba-test/ff1.dat | hexdump -C > 00000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > |................| > * > 00000400 > smbclient //172.31.9.167/uringff -Uroot%test -c "get ff1.dat" > > results in this log entries: > > [2020/05/06 06:51:57.069990, 0] ../../source3/modules/vfs_io_uring.c:89(vfs_io_uring_finish_req) > > vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=8388608 (0x800000) nread=16384 (0x4000) eof=8388608 (0x800000) blks=4096 blocks=16384 ff1.dat fnum 840153065 > > [2020/05/06 06:51:57.076882, 1] ../../source3/modules/vfs_io_uring.c:104(vfs_io_uring_finish_req) > > vfs_io_uring_finish_req: pread ofs=16384 (0x4000) len=8372224 (0x7fc000) nread=8372224 (0x7fc000) eof=8388608 (0x800000) blks=4096 blocks=16384 ff1.dat fnum 840153065 > > smbclient is just smart enough to recover itself from the short read. > But the windows client isn't. Well we pay attention to the amount of data returned and only increment the next read request by the amount actually returned. I'm amazed that the Windows client doesn't seem to check this ! > The attached test against liburing (git://git.kernel.dk/liburing) should > be able to demonstrate the problem. It can also be found in > https://github.com/metze-samba/liburing/tree/implicit-rwf-nowaithttps://github.com/metze-samba/liburing/commit/eb06dcfde747e46bd08bedf9def2e6cb536c39e3 > > > I added the sqe->rw_flags = RWF_NOWAIT; line in order to demonstrate it > against the Ubuntu 5.3 and 5.4 kernels. They both seem to have the bug. > > Can someone run the unmodified test/implicit-rwf_nowait against > a newer kernel? Aha. I wondered about the short read issue when this was first reported but I could never catch it in the act. If the Windows client doesn't check and the kernel returns short reads I guess we'll have to add logic similar to tstream_readv_send()/tstream_writev_send() that ensure all bytes requested/send actually go through the interface and from/into the kernel unless a read returns 0 (EOF) or a write returns an error. What a pain though :-(. SMB2+ server implementors really need to take note that Windows clients will corrupt files if they get a short read/write return. The fact that early kernels don't return short reads on io_uring but later kernels do makes it even worse :-(. There's even an SMB2 protocol field in SMB2_READ: "MinimumCount (4 bytes): The minimum number of bytes to be read for this operation to be successful. If fewer than the minimum number of bytes are read by the server, the server MUST return an error rather than the bytes read." We correctly return EOF if the amount read from the kernel is less than SMB2_READ.MinimumCount so I'm guessing they're not using it or looking at it (or setting it to zero). MinimumCount is supposed to allow the client to cope with this. Anoop, do you have wireshark traces so we can see what the Windows clients are setting here ? Jeremy.