(reposting after having subscribed, and from a different address since the first mail did not show up in the archives and didn't bounce either) Hello, I'm using Linux 6.4.0 from Debian/testing (but tried this with 6.5.1 too). I've created an ext4 filesystem with journalling disabled on an NVMe drive: mkfs.ext4 -O ^has_journal -F /dev/nvme0n1p6 I have a program that creates and open a new file with O_DIRECT, and sets its size to 8M with posix_fallocate(), something like: fd = open("/dir/file", O_CREAT | O_WRONLY | O_DIRECT); posix_fallocate(fd, 0, 8 * 1024 * 1024); fsync(fd); dirfd = open("/dir", O_RDONLY | O_DIRECTORY); fsync(dirfd); and then it uses io_uring to perform a single write of 4096 bytes at the beginning of the file, passing RWF_DSYNC to the submitted IORING_OP_WRITE_FIXED entry, I would expect the kernel to tell the NVMe device to actually flush the write, not only buffer it. However I measured the end-to-end latency of the io_uring operation and it was very low, as if the write was only buffered by the NVMe device, but not flushed. This suspicion seems to be confirmed by tracing the write nvme command sent to the device: raft-benchmark-37801 [003] ..... 9904.831153: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=4, cmdid=25169, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=498455480, len=7, ctrl=0x0, dsmgmt=0, reftag=0) notice the "ctrl=0x0" in there. If I run the same program, but I write directly to the /dev/nvme0n1p6 device instead of using a file, I get: raft-benchmark-37936 [003] ..... 9942.784628: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=4, cmdid=29265, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=498192384, len=7, ctrl=0x4000, dsmgmt=0, reftag=0) and here notice the "ctrl=0x4000", which I would expect is some flag to tell to the NVMe device to also flush the write. If I enable "write through" on the NVMe device: echo "write through" | sudo tee /sys/block/nvme0n1/queue/write_cache then also the direct write to the /dev/nvme0n1p6 device avoids flushing the command, as expected: raft-benchmark-38106 [003] ..... 9971.741308: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=4, cmdid=33361, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=498192384, len=7, ctrl=0x0, dsmgmt=0, reftag=0) (notice the "ctrl=0x0" there). My question is: is my interpretation correct? Is it expected that ext4 does not tell the block layer/device to flush this write when journalling is disabled? Below you find the complete event trace of the io_uring call, both in the ext4 write case and in the raw /dev/nvme0n1p6 write case. I can provide a C reproducer as well if needed, but wanted to first check if this rings any bell. Thanks, Free == ext4 == raft-benchmark-37801 [003] ..... 9904.830974: io_uring_submit_req: ring 0000000011cab2e4, req 00000000c7a7d835, user_data 0x0, opcode WRITE_FIXED, flags 0x1, sq_thread 0 raft-benchmark-37801 [003] ..... 9904.830982: ext4_es_lookup_extent_enter: dev 259,5 ino 12 lblk 0 raft-benchmark-37801 [003] ..... 9904.830983: ext4_es_lookup_extent_exit: dev 259,5 ino 12 found 1 [0/1) 32887 U raft-benchmark-37801 [003] ..... 9904.830985: ext4_journal_start_inode: dev 259,5 blocks 2, rsv_blocks 0, revoke_creds 8, type 1, ino 12, caller ext4_dirty_inode+0x38/0x80 [ext4] raft-benchmark-37801 [003] ..... 9904.830987: ext4_mark_inode_dirty: dev 259,5 ino 12 caller ext4_dirty_inode+0x5b/0x80 [ext4] raft-benchmark-37801 [003] ..... 9904.830989: block_touch_buffer: 259,5 sector=135 size=4096 raft-benchmark-37801 [003] ..... 9904.830993: block_dirty_buffer: 259,5 sector=135 size=4096 raft-benchmark-37801 [003] ..... 9904.831121: ext4_es_lookup_extent_enter: dev 259,5 ino 12 lblk 0 raft-benchmark-37801 [003] ..... 9904.831122: ext4_es_lookup_extent_exit: dev 259,5 ino 12 found 1 [0/1) 32887 UR raft-benchmark-37801 [003] ..... 9904.831123: ext4_journal_start_inode: dev 259,5 blocks 8, rsv_blocks 0, revoke_creds 8, type 3, ino 12, caller ext4_iomap_begin+0x1c2/0x2f0 [ext4] raft-benchmark-37801 [003] ..... 9904.831124: ext4_es_lookup_extent_enter: dev 259,5 ino 12 lblk 0 raft-benchmark-37801 [003] ..... 9904.831124: ext4_es_lookup_extent_exit: dev 259,5 ino 12 found 1 [0/1) 32887 UR raft-benchmark-37801 [003] ..... 9904.831125: ext4_ext_map_blocks_enter: dev 259,5 ino 12 lblk 0 len 1 flags CREATE|UNWRIT|PRE_IO raft-benchmark-37801 [003] ..... 9904.831126: ext4_es_cache_extent: dev 259,5 ino 12 es [0/1) mapped 32887 status U raft-benchmark-37801 [003] ..... 9904.831127: ext4_ext_show_extent: dev 259,5 ino 12 lblk 0 pblk 32887 len 1 raft-benchmark-37801 [003] ..... 9904.831128: ext4_ext_handle_unwritten_extents: dev 259,5 ino 12 m_lblk 0 m_pblk 32887 m_len 1 flags CREATE|UNWRIT|PRE_IO|METADATA_NOFAIL allocated 1 newblock 32887 raft-benchmark-37801 [003] ..... 9904.831129: ext4_es_cache_extent: dev 259,5 ino 12 es [0/1) mapped 32887 status U raft-benchmark-37801 [003] ..... 9904.831130: ext4_mark_inode_dirty: dev 259,5 ino 12 caller ext4_split_extent+0xcd/0x190 [ext4] raft-benchmark-37801 [003] ..... 9904.831131: block_touch_buffer: 259,5 sector=135 size=4096 raft-benchmark-37801 [003] ..... 9904.831133: block_dirty_buffer: 259,5 sector=135 size=4096 raft-benchmark-37801 [003] ..... 9904.831134: ext4_ext_map_blocks_exit: dev 259,5 ino 12 flags CREATE|UNWRIT|PRE_IO lblk 0 pblk 32887 len 1 mflags NMU ret 1 raft-benchmark-37801 [003] ..... 9904.831135: ext4_es_lookup_extent_enter: dev 259,5 ino 12 lblk 0 raft-benchmark-37801 [003] ..... 9904.831135: ext4_es_lookup_extent_exit: dev 259,5 ino 12 found 1 [0/1) 32887 UR raft-benchmark-37801 [003] ..... 9904.831136: ext4_es_insert_extent: dev 259,5 ino 12 es [0/1) mapped 32887 status U raft-benchmark-37801 [003] ..... 9904.831143: block_bio_remap: 259,0 WS 498455480 + 8 <- (259,5) 263096 raft-benchmark-37801 [003] ..... 9904.831144: block_bio_queue: 259,0 WS 498455480 + 8 [raft-benchmark] raft-benchmark-37801 [003] ..... 9904.831149: block_getrq: 259,0 WS 498455480 + 8 [raft-benchmark] raft-benchmark-37801 [003] ..... 9904.831149: block_plug: [raft-benchmark] raft-benchmark-37801 [003] ..... 9904.831153: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=4, cmdid=25169, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=498455480, len=7, ctrl=0x0, dsmgmt=0, reftag=0) raft-benchmark-37801 [003] ..... 9904.831159: block_rq_issue: 259,0 WS 4096 () 498455480 + 8 [raft-benchmark] raft-benchmark-37801 [003] d.h.. 9904.831173: nvme_sq: nvme0: disk=nvme0n1, qid=4, head=783, tail=783 raft-benchmark-37801 [003] d.h.. 9904.831177: nvme_complete_rq: nvme0: disk=nvme0n1, qid=4, cmdid=25169, res=0x0, retries=0, flags=0x0, status=0x0 raft-benchmark-37801 [003] d.h.. 9904.831178: block_rq_complete: 259,0 WS () 498455480 + 8 [0] kworker/3:1-30279 [003] ..... 9904.831193: ext4_journal_start_inode: dev 259,5 blocks 8, rsv_blocks 0, revoke_creds 8, type 3, ino 12, caller ext4_convert_unwritten_extents+0xb4/0x260 [ext4] kworker/3:1-30279 [003] ..... 9904.831193: ext4_es_lookup_extent_enter: dev 259,5 ino 12 lblk 0 kworker/3:1-30279 [003] ..... 9904.831194: ext4_es_lookup_extent_exit: dev 259,5 ino 12 found 1 [0/1) 32887 U kworker/3:1-30279 [003] ..... 9904.831194: ext4_ext_map_blocks_enter: dev 259,5 ino 12 lblk 0 len 1 flags CREATE|UNWRIT|CONVERT kworker/3:1-30279 [003] ..... 9904.831195: ext4_es_cache_extent: dev 259,5 ino 12 es [0/1) mapped 32887 status U kworker/3:1-30279 [003] ..... 9904.831195: ext4_ext_show_extent: dev 259,5 ino 12 lblk 0 pblk 32887 len 1 kworker/3:1-30279 [003] ..... 9904.831196: ext4_ext_handle_unwritten_extents: dev 259,5 ino 12 m_lblk 0 m_pblk 32887 m_len 1 flags CREATE|UNWRIT|CONVERT|METADATA_NOFAIL allocated 1 newblock 32887 kworker/3:1-30279 [003] ..... 9904.831196: ext4_mark_inode_dirty: dev 259,5 ino 12 caller ext4_ext_map_blocks+0xeee/0x1980 [ext4] kworker/3:1-30279 [003] ..... 9904.831197: block_touch_buffer: 259,5 sector=135 size=4096 kworker/3:1-30279 [003] ..... 9904.831198: block_dirty_buffer: 259,5 sector=135 size=4096 kworker/3:1-30279 [003] ..... 9904.831199: ext4_ext_map_blocks_exit: dev 259,5 ino 12 flags CREATE|UNWRIT|CONVERT lblk 0 pblk 32887 len 1 mflags M ret 1 kworker/3:1-30279 [003] ..... 9904.831199: ext4_es_insert_extent: dev 259,5 ino 12 es [0/1) mapped 32887 status W kworker/3:1-30279 [003] ..... 9904.831200: ext4_mark_inode_dirty: dev 259,5 ino 12 caller ext4_convert_unwritten_extents+0x1e2/0x260 [ext4] kworker/3:1-30279 [003] ..... 9904.831200: block_touch_buffer: 259,5 sector=135 size=4096 kworker/3:1-30279 [003] ..... 9904.831201: block_dirty_buffer: 259,5 sector=135 size=4096 kworker/3:1-30279 [003] ..... 9904.831202: ext4_sync_file_enter: dev 259,5 ino 12 parent 2 datasync 1 kworker/3:1-30279 [003] ..... 9904.831203: ext4_sync_file_exit: dev 259,5 ino 12 ret 0 raft-benchmark-37801 [003] ...1. 9904.831212: io_uring_complete: ring 0000000011cab2e4, req 00000000c7a7d835, user_data 0x0, result 4096, cflags 0x0 extra1 0 extra2 0 raft-benchmark-37801 [003] ..... 9904.831213: io_uring_task_work_run: tctx 000000005c8048e9, count 1, loops 1 == /dev/nvme0n1p6 == raft-benchmark-37936 [003] ..... 9942.784613: io_uring_submit_req: ring 00000000c4f84b11, req 00000000058196bf, user_data 0x0, opcode WRITE_FIXED, flags 0x1, sq_thread 0 raft-benchmark-37936 [003] ..... 9942.784622: block_bio_remap: 259,0 WFS 498192384 + 8 <- (259,5) 0 raft-benchmark-37936 [003] ..... 9942.784622: block_bio_queue: 259,0 WFS 498192384 + 8 [raft-benchmark] raft-benchmark-37936 [003] ..... 9942.784626: block_getrq: 259,0 WFS 498192384 + 8 [raft-benchmark] raft-benchmark-37936 [003] ..... 9942.784628: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=4, cmdid=29265, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=498192384, len=7, ctrl=0x4000, dsmgmt=0, reftag=0) raft-benchmark-37936 [003] ..... 9942.784634: block_rq_issue: 259,0 WFS 4096 () 498192384 + 8 [raft-benchmark] raft-benchmark-37936 [003] d.h.. 9942.794331: nvme_sq: nvme0: disk=nvme0n1, qid=4, head=784, tail=784 raft-benchmark-37936 [003] d.h.. 9942.794334: nvme_complete_rq: nvme0: disk=nvme0n1, qid=4, cmdid=29265, res=0x0, retries=0, flags=0x0, status=0x0 raft-benchmark-37936 [003] d.h.. 9942.794334: block_rq_complete: 259,0 WFS () 498192384 + 8 [0] raft-benchmark-37936 [003] ...1. 9942.794338: io_uring_complete: ring 00000000c4f84b11, req 00000000058196bf, user_data 0x0, result 4096, cflags 0x0 extra1 0 extra2 0 raft-benchmark-37936 [003] ..... 9942.794338: io_uring_task_work_run: tctx 000000005fccbc0a, count 1, loops 1 -------------------- End of forwarded message --------------------