On Tue, Feb 07, 2023 at 12:31:12PM +0100, Claudio Imbrenda wrote: > On Mon, 30 Jan 2023 14:24:40 -0400 > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > On Mon, Jan 30, 2023 at 07:21:04PM +0100, Claudio Imbrenda wrote: > > > On Thu, 26 Jan 2023 12:35:37 -0400 > > > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > > > > > On Thu, Jan 26, 2023 at 04:39:02PM +0100, Claudio Imbrenda wrote: > > > > > > > > > I can tell you that the original goal of that function is to make sure > > > > > that there are no extra references. in particular, we want to prevent > > > > > I/O of any kind to be ongoing while the page becomes secure. (the I/O > > > > > will fail and, depending on which device it was, the whole system might > > > > > end up in a rather unhappy state) > > > > > > > > Sure, but if there is concurrent IO you just try again right? It > > > > doesn't wait for refs to drop for instance. > > > > > > > > So make the lock_page work the same way: > > > > > > the more I look at this, the less I understand why I wrote the code > > > like that. I do have a comment, though > > > > > > > > > > > diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c > > > > index 9f18a4af9c1319..847ee50b8672c6 100644 > > > > --- a/arch/s390/kernel/uv.c > > > > +++ b/arch/s390/kernel/uv.c > > > > @@ -193,20 +193,11 @@ static int expected_page_refs(struct page *page) > > > > } > > > > > > > > static int make_secure_pte(pte_t *ptep, unsigned long addr, > > > > - struct page *exp_page, struct uv_cb_header *uvcb) > > > > + struct page *page, struct uv_cb_header *uvcb) > > > > { > > > > pte_t entry = READ_ONCE(*ptep); > > > > - struct page *page; > > > > int expected, cc = 0; > > > > > > > > - if (!pte_present(entry)) > > > > - return -ENXIO; > > > > - if (pte_val(entry) & _PAGE_INVALID) > > > > - return -ENXIO; > > > > - > > > > - page = pte_page(entry); > > > > - if (page != exp_page) > > > > - return -ENXIO; > > > > if (PageWriteback(page)) > > > > return -EAGAIN; > > > > expected = expected_page_refs(page); > > > > @@ -304,17 +295,25 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb) > > > > goto out; > > > > > > > > rc = -ENXIO; > > > > - page = follow_page(vma, uaddr, FOLL_WRITE); > > > > - if (IS_ERR_OR_NULL(page)) > > > > - goto out; > > > > - > > > > - lock_page(page); > > > > ptep = get_locked_pte(gmap->mm, uaddr, &ptelock); > > > > + > > > > + if (!pte_present(entry)) > > > > + goto out_unlock_pte; > > > > + if (pte_val(entry) & _PAGE_INVALID) > > > > + goto out_unlock_pte; > > > > > > I guess we also need to make sure the page was writable? > > > FOLL_WRITE made sure of that > > > > > > so I guess something like: > > > > > > if (!pte_write(entry)) > > > goto out_unlock_pte; > > > > Probably, that looks like an existing race that it wasn't re-checked :\ > > > > Jason > > how do we want to proceed with this? > > I can write a patch based on the above and see if anything breaks, but > it will take some thinking and testing and reviewing before I'll be > comfortable with it. I would be happy if you did that I think the code is clearly racey as written now Jason