On Tue, Mar 04, 2025 at 12:18:04PM +0100, Mikulas Patocka wrote: > > > On Tue, 4 Mar 2025, Dave Chinner wrote: > > > On Mon, Mar 03, 2025 at 10:03:42PM +0100, Mikulas Patocka wrote: > > > > > > > > > On Mon, 3 Mar 2025, Christoph Hellwig wrote: > > > > > > > On Mon, Mar 03, 2025 at 05:16:48PM +0100, Mikulas Patocka wrote: > > > > > What should I use instead of bmap? Is fiemap exported for use in the > > > > > kernel? > > > > > > > > You can't do an ahead of time mapping. It's a broken concept. > > > > > > Swapfile does ahead of time mapping. And I just looked at what swapfile > > > does and copied the logic into dm-loop. If swapfile is not broken, how > > > could dm-loop be broken? > > > > Swap files cannot be accessed/modified by user code once the > > swapfile is activated. See all the IS_SWAPFILE() checked throughout > > the VFS and filesystem code. > > > > Swap files must be fully allocated (i.e. not sparse), nor contan > > shared extents. This is required so that writes to the swapfile do > > not require block allocation which would change the mapping... > > > > Hence we explicitly prevent modification of the underlying file > > mapping once a swapfile is owned and mapped by the kernel as a > > swapfile. > > > > That's not how loop devices/image files work - we actually rely on > > them being: > > > > a) sparse; and > > b) the mapping being mutable via direct access to the loop file > > whilst there is an active mounted filesystem on that loop file. > > > > and so every IO needs to be mapped through the filesystem at > > submission time. > > > > The reason for a) is obvious: we don't need to allocate space for > > the filesystem so it's effectively thin provisioned. Also, fstrim on > > the mounted loop device can punch out unused space in the mounted > > filesytsem. > > > > The reason for b) is less obvious: snapshots via file cloning, > > deduplication via extent sharing. > > > > The clone operaiton is an atomic modification of the underlying file > > mapping, which then triggers COW on future writes to those mappings, > > which causes the mapping to the change at write IO time. > > > > IOWs, the whole concept that there is a "static mapping" for a loop > > device image file for the life of the image file is fundamentally > > flawed. > > I'm not trying to break existing loop. I didn't say you were. I said the concept that dm-loop is based on is fundamentally flawed and that your benchmark setup does not reflect real world usage of loop devices. > But some users don't use COW filesystems, some users use fully provisioned > files, some users don't need to write to a file when it is being mapped - > and for them dm-loop would be viable alternative because of better > performance. Nothing has changed since 2008 when this "fast file mapping" thing was first proposed and dm-loop made it's first appearance in this thread: https://lore.kernel.org/linux-fsdevel/20080109085231.GE6650@xxxxxxxxx/ Let me quote Christoph's response to Jen's proposed static mapping for the loop device patch back in 2008: | And the way this is done is simply broken. It means you have to get | rid of things like delayed or unwritten hands beforehand, it'll be | a complete pain for COW or non-block backed filesystems. | | The right way to do this is to allow direct I/O from kernel sources | where the filesystem is in-charge of submitting the actual I/O after | the pages are handed to it. I think Peter Zijlstra has been looking | into something like that for swap over nfs. Jens also said this about dm-loop in that thread: } Why oh why does dm always insist to reinvent everything? That's bad } enough in itself, but on top of that most of the extra stuff ends up } being essentially unmaintained. } } If we instead improve loop, everyone wins. } } Sorry to sound a bit harsh, but sometimes it doesn't hurt to think a bit } outside your own sandbox. You - personally - were also told directly by Jens back then that dm-loop's approach simply does not work for filesystems that move blocks around. i.e. it isn't a viable appraoch. Nothing has changed - it still isn't a viable approach for loopback devices for the same reasons it wasnt' viable in 2008. > The Android people concluded that loop is too slow and rather than using > loop they want to map a file using a table with dm-linear targets over the > image of the host filesystem. So, they are already doing what dm-loop is > doing. I don't care if a downstream kernel is doing something stupid with their kernels. Where are the bug reports about the loop device being slow and the analysis that indicates that it is unfixable? The fact is that AIO+DIO through filesystems like XFS performs generally within 1-2% of the underlying block device capabilities. Hence if there's a problem with loop device performance, it isn't in the backing file IO submission path. Find out why loop device AIO+DIO is slow for the workload you are testing and fix that. This way everyone who already uses loop devices benefits (as Jens said in 2008), and the Android folk can get rid of their hacky mapping setup.... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx