Re: [PATCH v12 resend] fs: Add VirtualBox guest shared folder (vboxsf) support

Christoph Hellwig <hch@xxxxxxxxxxxxx> · Mon, 12 Aug 2019 07:17:01 -0700

On Mon, Aug 12, 2019 at 03:35:39PM +0200, Hans de Goede wrote:
> > Also these casts to uintptr_t for a call that reads data look very
> > odd.
> 
> Yes, as I already discussed with Al, that is because vboxsf_read
> can be (and is) used with both kernel and userspace buffer pointers.
> 
> In case of a userspace pointer the underlying IPC code to the host takes
> care of copy to / from user for us. That IPC code can also be used from
> userspace through ioctls on /dev/vboxguest, so the handling of both
> in kernel and userspace addresses is something which it must be able
> to handle anyways, at which point we might as well use it in vboxsf too.
> 
> But since both Al and you pointed this out as being ugly, I will add
> 2 separate vboxsf_read_user and vboxsf_read_kernel functions for the
> next version, then the cast (and the true flag) can both go away.

What might be even better is to pass a struct iov_iter to the low-level
function.  That gets you 90% of implementing the read_iter and
write_iter methods, as well as a versatile low-level primite that
can deal with kernel and user address, as well as pages.

> > > +	/* Make sure any pending writes done through mmap are flushed */
> > 
> > Why?
> 
> I believe that if we were doing everything through the page-cache then a regular
> write to the same range as a write done through mmap, with the regular write
> happening after (in time) the mmap write, will overwrite the mmap
> written data, we want the same behavior here.

But what happens if you mmap and write at the same or at least
barely the same time.

> 
> > > +	err = filemap_fdatawait_range(inode->i_mapping, pos, pos + nwritten);
> > > +	if (err)
> > > +		return err;
> > 
> > Also this whole write function seems to miss i_rwsem.
> 
> Hmm, I do not see e.g. v9fs_file_write_iter take that either, nor a couple
> of other similar not block-backed filesystems. Will this still
> be necessary after converting to the iter interfaces?

Yes.

> The problem is that the IPC to the host which we build upon only offers
> regular read / write calls. So the most consistent (also cache coherent)
> mapping which we can offer is to directly mapping read -> read and
> wrtie->write without the pagecache. Ideally we would be able to just
> say sorry cannot do mmap, but too much apps rely on mmap and the
> out of tree driver has this mmap "emulation" which means not offering
> it in the mainline version would be a serious regression.
> 
> In essence this is the same situation as a bunch of network filesystems
> are in and I've looked at several for inspiration. Looking again at
> e.g. v9fs_file_write_iter it does similar regular read -> read mapping
> with invalidation of the page-cache for mmap users.

v9 is probably not a good idea to copy in general.  While not the best
idea to copy directly either I'd rather look at nfs - that is another
protocol without a real distributed lock manager, but at least the
NFS close to open semantics are reasonably well defined and allow using
the pagecache.

> I must admit that I've mostly cargo-culted this from other fs code
> such as the 9p code, or the cifs code which has:
> 
> /*
>  * If the page is mmap'ed into a process' page tables, then we need to make
>  * sure that it doesn't change while being written back.
>  */
> static vm_fault_t
> cifs_page_mkwrite(struct vm_fault *vmf)
> {
>         struct page *page = vmf->page;
> 
>         lock_page(page);
>         return VM_FAULT_LOCKED;
> }
> 
> The if (page->mapping != inode->i_mapping) is used in several places
> including the 9p code, bit as you can see no in the cifs code. I couldn't
> really find a rational for that check, so I'm fine with dropping that check.

If you don't implement ->page_mkwrite the caller will just lock the page
for you..