Re: [PATCH v5 04/11] blksnap: header file of the module interface

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 15 Jun 2023 10:08:20 +1000

On Wed, Jun 14, 2023 at 07:07:16AM -0700, Christoph Hellwig wrote:
> On Wed, Jun 14, 2023 at 11:26:20AM +0200, Sergei Shtepa wrote:
> > This code worked quite successfully for the veeamsnap module, on the
> > basis of which blksnap was created. Indeed, such an allocation of an
> > area on a block device using a file does not look safe.
> > 
> > We've already discussed this with Donald Buczek <buczek@xxxxxxxxxxxxx>.
> > Link: https://github.com/veeam/blksnap/issues/57#issuecomment-1576569075
> > And I have planned work on moving to a more secure ioctl in the future.
> > Link: https://github.com/veeam/blksnap/issues/61
> > 
> > Now, thanks to Dave, it becomes clear to me how to solve this problem best.
> > swapfile is a good example of how to do it right.
> 
> I don't actually think swapfile is a very good idea, in fact the Linux
> swap code in general is not a very good place to look for inspirations
> :)

Yeah, the swapfile implementation isn't very nice, I was really just
using it as an example of how we can implement the requirements of
block mapping delegation in a safe manner to a kernel subsystem.

I think the important part is the swapfile inode flag, because that
is what keeps userspace from being able to screw with the file while
the kernel is using it and allows us to do read/write IO to
unwritten extents without converting them to written...

> IFF the usage is always to have a whole file for the diff storage the
> over all API is very simple - just pass a fd to the kernel for the area,
> and then use in-kernel direct I/O on it.

Yeah, I was thinking a fd is a better choice for the UAPI as it
frees up the kernel implementation, and it doesn't need us to pass a
separate bdev identifier in the ioctl. It also means we can pass a
regular file or a block device and the kernel code doesn't need to
care that they are different.

If you think direct IO is a better idea, then I have no objection to
that - I haven't looked into the implementation that deeply at this
point. I wanted to get an understanding of how all the pieces went
together first, so all I've read is the documentation and looked at
the UAPI.

I made a leap from that: the documentation keeps talking about using
files a the filesystem for the difference storage, but the only UAPI
for telling the kernel about storage regions it can use is this
physical bdev LBA mapping ioctl. Hence if file storage is being
used....

> Now if that file should also
> be able to reside on the same file system that the snapshot is taken
> of things get a little more complicated, because writes to it also need
> to automatically set the BIO_REFFED flag.  I have some ideas for that
> and will share some draft code with you.

Cool, I look forward to the updates; I know of a couple of
applications that could make use of this functionality right
away....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx