On Mon, Aug 12, 2019 at 03:35:39PM +0200, Hans de Goede wrote: > > Also these casts to uintptr_t for a call that reads data look very > > odd. > > Yes, as I already discussed with Al, that is because vboxsf_read > can be (and is) used with both kernel and userspace buffer pointers. > > In case of a userspace pointer the underlying IPC code to the host takes > care of copy to / from user for us. That IPC code can also be used from > userspace through ioctls on /dev/vboxguest, so the handling of both > in kernel and userspace addresses is something which it must be able > to handle anyways, at which point we might as well use it in vboxsf too. > > But since both Al and you pointed this out as being ugly, I will add > 2 separate vboxsf_read_user and vboxsf_read_kernel functions for the > next version, then the cast (and the true flag) can both go away. What might be even better is to pass a struct iov_iter to the low-level function. That gets you 90% of implementing the read_iter and write_iter methods, as well as a versatile low-level primite that can deal with kernel and user address, as well as pages. > > > + /* Make sure any pending writes done through mmap are flushed */ > > > > Why? > > I believe that if we were doing everything through the page-cache then a regular > write to the same range as a write done through mmap, with the regular write > happening after (in time) the mmap write, will overwrite the mmap > written data, we want the same behavior here. But what happens if you mmap and write at the same or at least barely the same time. > > > > + err = filemap_fdatawait_range(inode->i_mapping, pos, pos + nwritten); > > > + if (err) > > > + return err; > > > > Also this whole write function seems to miss i_rwsem. > > Hmm, I do not see e.g. v9fs_file_write_iter take that either, nor a couple > of other similar not block-backed filesystems. Will this still > be necessary after converting to the iter interfaces? Yes. > The problem is that the IPC to the host which we build upon only offers > regular read / write calls. So the most consistent (also cache coherent) > mapping which we can offer is to directly mapping read -> read and > wrtie->write without the pagecache. Ideally we would be able to just > say sorry cannot do mmap, but too much apps rely on mmap and the > out of tree driver has this mmap "emulation" which means not offering > it in the mainline version would be a serious regression. > > In essence this is the same situation as a bunch of network filesystems > are in and I've looked at several for inspiration. Looking again at > e.g. v9fs_file_write_iter it does similar regular read -> read mapping > with invalidation of the page-cache for mmap users. v9 is probably not a good idea to copy in general. While not the best idea to copy directly either I'd rather look at nfs - that is another protocol without a real distributed lock manager, but at least the NFS close to open semantics are reasonably well defined and allow using the pagecache. > I must admit that I've mostly cargo-culted this from other fs code > such as the 9p code, or the cifs code which has: > > /* > * If the page is mmap'ed into a process' page tables, then we need to make > * sure that it doesn't change while being written back. > */ > static vm_fault_t > cifs_page_mkwrite(struct vm_fault *vmf) > { > struct page *page = vmf->page; > > lock_page(page); > return VM_FAULT_LOCKED; > } > > The if (page->mapping != inode->i_mapping) is used in several places > including the 9p code, bit as you can see no in the cifs code. I couldn't > really find a rational for that check, so I'm fine with dropping that check. If you don't implement ->page_mkwrite the caller will just lock the page for you..