Re: [PATCH v3 2/2] s390/mm: re-enable the shared zeropage for !PV and !skeys KVM guests

Alexander Gordeev <agordeev@xxxxxxxxxxxxx> · Tue, 16 Apr 2024 08:37:29 +0200

On Mon, Apr 15, 2024 at 09:14:03PM +0200, David Hildenbrand wrote:
> > > +retry:
> > > +		rc = walk_page_range_vma(vma, addr, vma->vm_end,
> > > +					 &find_zeropage_ops, &addr);
> > > +		if (rc <= 0)
> > > +			continue;
> > 
> > So in case an error is returned for the last vma, __s390_unshare_zeropage()
> > finishes with that error. By contrast, the error for a non-last vma would
> > be ignored?
> 
> Right, it looks a bit off. walk_page_range_vma() shouldn't fail
> unless find_zeropage_pte_entry() would fail -- which would also be
> very unexpected.
> 
> To handle it cleanly in case we would ever get a weird zeropage where we
> don't expect it, we should probably just exit early.
> 
> Something like the following (not compiled, addressing the comment below):

> @@ -2618,7 +2618,8 @@ static int __s390_unshare_zeropages(struct mm_struct *mm)
>  	struct vm_area_struct *vma;
>  	VMA_ITERATOR(vmi, mm, 0);
>  	unsigned long addr;
> -	int rc;
> +	vm_fault_t rc;
> +	int zero_page;

I would use "fault" for mm faults (just like everywhere else handle_mm_fault() is
called) and leave rc as is:

	vm_fault_t fault;
	int rc;

>  	for_each_vma(vmi, vma) {
>  		/*
> @@ -2631,9 +2632,11 @@ static int __s390_unshare_zeropages(struct mm_struct *mm)
>  		addr = vma->vm_start;
>  retry:
> -		rc = walk_page_range_vma(vma, addr, vma->vm_end,
> -					 &find_zeropage_ops, &addr);
> -		if (rc <= 0)
> +		zero_page = walk_page_range_vma(vma, addr, vma->vm_end,
> +						&find_zeropage_ops, &addr);
> +		if (zero_page < 0)
> +			return zero_page;
> +		else if (!zero_page)
>  			continue;
>  		/* addr was updated by find_zeropage_pte_entry() */
> @@ -2656,7 +2659,7 @@ static int __s390_unshare_zeropages(struct mm_struct *mm)
>  		goto retry;
>  	}
> -	return rc;
> +	return 0;
>  }
>  static int __s390_disable_cow_sharing(struct mm_struct *mm)

...

> > > +		/* addr was updated by find_zeropage_pte_entry() */
> > > +		rc = handle_mm_fault(vma, addr,
> > > +				     FAULT_FLAG_UNSHARE | FAULT_FLAG_REMOTE,
> > > +				     NULL);
> > > +		if (rc & VM_FAULT_OOM)
> > > +			return -ENOMEM;
> > 
> > Heiko pointed out that rc type is inconsistent vs vm_fault_t returned by
> 
> Right, let's use another variable for that.
> 
> > handle_mm_fault(). While fixing it up, I've got concerned whether is it
> > fine to continue in case any other error is met (including possible future
> > VM_FAULT_xxxx)?
> 
> Such future changes would similarly break break_ksm(). Staring at it, I do wonder
> if break_ksm() should be handling VM_FAULT_HWPOISON ... very likely we should
> handle it and fail -- we might get an MC while copying from the source page.
> 
> VM_FAULT_HWPOISON on the shared zeropage would imply a lot of trouble, so
> I'm not concerned about that for the case here, but handling it in the future
> would be cleaner.
> 
> Note that we always retry the lookup, so we won't just skip a zeropage on unexpected
> errors.
> 
> We could piggy-back on vm_fault_to_errno(). We could use
> vm_fault_to_errno(rc, FOLL_HWPOISON), and only continue (retry) if the rc is 0 or
> -EFAULT, otherwise fail with the returned error.
> 
> But I'd do that as a follow up, and also use it in break_ksm() in the same fashion.

@Christian, do you agree with this suggestion?

Thanks!