On Wed, Jun 14, 2023 at 07:07:16AM -0700, Christoph Hellwig wrote: > On Wed, Jun 14, 2023 at 11:26:20AM +0200, Sergei Shtepa wrote: > > This code worked quite successfully for the veeamsnap module, on the > > basis of which blksnap was created. Indeed, such an allocation of an > > area on a block device using a file does not look safe. > > > > We've already discussed this with Donald Buczek <buczek@xxxxxxxxxxxxx>. > > Link: https://github.com/veeam/blksnap/issues/57#issuecomment-1576569075 > > And I have planned work on moving to a more secure ioctl in the future. > > Link: https://github.com/veeam/blksnap/issues/61 > > > > Now, thanks to Dave, it becomes clear to me how to solve this problem best. > > swapfile is a good example of how to do it right. > > I don't actually think swapfile is a very good idea, in fact the Linux > swap code in general is not a very good place to look for inspirations > :) Yeah, the swapfile implementation isn't very nice, I was really just using it as an example of how we can implement the requirements of block mapping delegation in a safe manner to a kernel subsystem. I think the important part is the swapfile inode flag, because that is what keeps userspace from being able to screw with the file while the kernel is using it and allows us to do read/write IO to unwritten extents without converting them to written... > IFF the usage is always to have a whole file for the diff storage the > over all API is very simple - just pass a fd to the kernel for the area, > and then use in-kernel direct I/O on it. Yeah, I was thinking a fd is a better choice for the UAPI as it frees up the kernel implementation, and it doesn't need us to pass a separate bdev identifier in the ioctl. It also means we can pass a regular file or a block device and the kernel code doesn't need to care that they are different. If you think direct IO is a better idea, then I have no objection to that - I haven't looked into the implementation that deeply at this point. I wanted to get an understanding of how all the pieces went together first, so all I've read is the documentation and looked at the UAPI. I made a leap from that: the documentation keeps talking about using files a the filesystem for the difference storage, but the only UAPI for telling the kernel about storage regions it can use is this physical bdev LBA mapping ioctl. Hence if file storage is being used.... > Now if that file should also > be able to reside on the same file system that the snapshot is taken > of things get a little more complicated, because writes to it also need > to automatically set the BIO_REFFED flag. I have some ideas for that > and will share some draft code with you. Cool, I look forward to the updates; I know of a couple of applications that could make use of this functionality right away.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx