On Fri, Mar 08, 2019 at 04:58:44PM +0800, Jason Wang wrote: > > On 2019/3/8 上午3:17, Jerome Glisse wrote: > > On Thu, Mar 07, 2019 at 12:56:45PM -0500, Michael S. Tsirkin wrote: > > > On Thu, Mar 07, 2019 at 10:47:22AM -0500, Michael S. Tsirkin wrote: > > > > On Wed, Mar 06, 2019 at 02:18:12AM -0500, Jason Wang wrote: > > > > > +static const struct mmu_notifier_ops vhost_mmu_notifier_ops = { > > > > > + .invalidate_range = vhost_invalidate_range, > > > > > +}; > > > > > + > > > > > void vhost_dev_init(struct vhost_dev *dev, > > > > > struct vhost_virtqueue **vqs, int nvqs, int iov_limit) > > > > > { > > > > I also wonder here: when page is write protected then > > > > it does not look like .invalidate_range is invoked. > > > > > > > > E.g. mm/ksm.c calls > > > > > > > > mmu_notifier_invalidate_range_start and > > > > mmu_notifier_invalidate_range_end but not mmu_notifier_invalidate_range. > > > > > > > > Similarly, rmap in page_mkclean_one will not call > > > > mmu_notifier_invalidate_range. > > > > > > > > If I'm right vhost won't get notified when page is write-protected since you > > > > didn't install start/end notifiers. Note that end notifier can be called > > > > with page locked, so it's not as straight-forward as just adding a call. > > > > Writing into a write-protected page isn't a good idea. > > > > > > > > Note that documentation says: > > > > it is fine to delay the mmu_notifier_invalidate_range > > > > call to mmu_notifier_invalidate_range_end() outside the page table lock. > > > > implying it's called just later. > > > OK I missed the fact that _end actually calls > > > mmu_notifier_invalidate_range internally. So that part is fine but the > > > fact that you are trying to take page lock under VQ mutex and take same > > > mutex within notifier probably means it's broken for ksm and rmap at > > > least since these call invalidate with lock taken. > > > > > > And generally, Andrea told me offline one can not take mutex under > > > the notifier callback. I CC'd Andrea for why. > > Correct, you _can not_ take mutex or any sleeping lock from within the > > invalidate_range callback as those callback happens under the page table > > spinlock. You can however do so under the invalidate_range_start call- > > back only if it is a blocking allow callback (there is a flag passdown > > with the invalidate_range_start callback if you are not allow to block > > then return EBUSY and the invalidation will be aborted). > > > > > > > That's a separate issue from set_page_dirty when memory is file backed. > > If you can access file back page then i suggest using set_page_dirty > > from within a special version of vunmap() so that when you vunmap you > > set the page dirty without taking page lock. It is safe to do so > > always from within an mmu notifier callback if you had the page map > > with write permission which means that the page had write permission > > in the userspace pte too and thus it having dirty pte is expected > > and calling set_page_dirty on the page is allowed without any lock. > > Locking will happen once the userspace pte are tear down through the > > page table lock. > > > Can I simply can set_page_dirty() before vunmap() in the mmu notifier > callback, or is there any reason that it must be called within vumap()? > > Thanks I think this is what Jerome is saying, yes. Maybe add a patch to mmu notifier doc file, documenting this? > > > > > > It's because of all these issues that I preferred just accessing > > > userspace memory and handling faults. Unfortunately there does not > > > appear to exist an API that whitelists a specific driver along the lines > > > of "I checked this code for speculative info leaks, don't add barriers > > > on data path please". > > Maybe it would be better to explore adding such helper then remapping > > page into kernel address space ? > > > > Cheers, > > Jérôme