On Mon, Mar 03, 2025 at 10:03:42PM +0100, Mikulas Patocka wrote: > > > On Mon, 3 Mar 2025, Christoph Hellwig wrote: > > > On Mon, Mar 03, 2025 at 05:16:48PM +0100, Mikulas Patocka wrote: > > > What should I use instead of bmap? Is fiemap exported for use in the > > > kernel? > > > > You can't do an ahead of time mapping. It's a broken concept. > > Swapfile does ahead of time mapping. And I just looked at what swapfile > does and copied the logic into dm-loop. If swapfile is not broken, how > could dm-loop be broken? Swap files cannot be accessed/modified by user code once the swapfile is activated. See all the IS_SWAPFILE() checked throughout the VFS and filesystem code. Swap files must be fully allocated (i.e. not sparse), nor contan shared extents. This is required so that writes to the swapfile do not require block allocation which would change the mapping... Hence we explicitly prevent modification of the underlying file mapping once a swapfile is owned and mapped by the kernel as a swapfile. That's not how loop devices/image files work - we actually rely on them being: a) sparse; and b) the mapping being mutable via direct access to the loop file whilst there is an active mounted filesystem on that loop file. and so every IO needs to be mapped through the filesystem at submission time. The reason for a) is obvious: we don't need to allocate space for the filesystem so it's effectively thin provisioned. Also, fstrim on the mounted loop device can punch out unused space in the mounted filesytsem. The reason for b) is less obvious: snapshots via file cloning, deduplication via extent sharing. The clone operaiton is an atomic modification of the underlying file mapping, which then triggers COW on future writes to those mappings, which causes the mapping to the change at write IO time. IOWs, the whole concept that there is a "static mapping" for a loop device image file for the life of the image file is fundamentally flawed. > > > Dm-loop is significantly faster than the regular loop: > > > > > > # modprobe brd rd_size=1048576 > > > # dd if=/dev/zero of=/dev/ram0 bs=1048576 > > > # mkfs.ext4 /dev/ram0 > > > # mount -t ext4 /dev/ram0 /mnt/test > > > # dd if=/dev/zero of=/mnt/test/test bs=1048576 count=512 Urk. ram disks are terrible for IO benchmarking. The IO is synchronous (i.e. always completes in the submitter context) and performance is -always CPU bound- due to the requirement for all data copying to be marshalled through the CPU. Please benchmark performance on NVMe SSDs - it will give a much more accurate deomonstration of the performance differences we'll see in real world usage of loop device functionality... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx