Re: [PATCH/RFC G-U-P experts] IB/umem: Modernize our get_user_pages() parameters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 6 Feb 2012, Roland Dreier wrote:
> Sorry for the slow reply, I got caught in other business...

Negative problem!

> 
> On Mon, Jan 30, 2012 at 12:34 PM, Hugh Dickins <hughd@xxxxxxxxxx> wrote:
> > The hardest part about implementing that is deciding what snappy
> > name to give the FOLL_ flag.
> 
> Yes... FOLL_SOFT_COW ?  FOLL_READONLY_COW ?
> (plus a good comment explaining it I guess)

I don't grok either of those - surely not READONLY_COW.

FOLL_PREPARE is the closest I've got.  I'd have said FOLL_TOUCH
if that weren't already taken.  FOLL_WRITE_IF_ABLE?

> 
> >> I don't think we want to do the "force" semantics or deal with the
> >> VM_MAYWRITE possiblity -- the access we give the hardware on
> >> behalf of userspace should just match the access that userspace
> >> actually has.  It seems that if we don't try to get pages for writing
> >> when VM_WRITE isn't set, we don't need force anymore.
> >
> > I suspect you never needed or wanted the weird force behaviour on
> > shared maywrite, but that you did need the force COW behaviour on
> > private currently-unwritable maywrite.  You (or your forebears)
> > defined that interface to use the force flag, I'm guessing it was
> > for a reason; now you want to change it not to use the force flag,
> > and it sounds good, but I'm afraid you'll discover down the line
> > what the force flag was for.
> 
> Actually I think I understand why the original code passed !write
> as the force parameter.
> 
> If the user is registering memory with read-only access, there are
> two common cases.  Possibly the underlying memory really has
> a read-only mapping, but probably more often it is just an ordinary
> buffer allocated in userspace with malloc() or the like.
> 
> In the second case, it's quite likely we have a read/write mapping
> of anonymous pages.  We'll expose it read-only for RDMA but the
> userspace process will write data into the memory via ordinary CPU
> access.  However, if we do ibv_reg_mr() before initializing the memory
> it's quite possible that the mapping actually points to the zero page,
> waiting for a CPU write to trigger a COW.
> 
> So in the second case, doing GUP without the write flag will leave
> the COW untriggered, and we'll end up mapping the zero page to
> the hardware, and RDMA won't read the data that userspace actually
> writes.  So (without GUP extension as we're discussing in this thread)
> we're forced to pass write==1 to GUP, even if we expect hardware
> to only do reads.
> 
> But if we pass write==1, then GUP on the first case (mapping that
> is genuinely read-only) will fail, unless we pass force==1 too.  But
> this should only succeed if we're going to only access the memory
> read-only, so we should set force to !writable-access-by-rdma.
> 
> Which I think explains why the code is the way it is.  But clearly
> we could do better if we had a better way of telling GUP our real
> intentions -- ie the FOLL_READONLY_COW flag.

You've persuaded me.  Yes, you have been using force because that was
the only tool available at the time, to get close to the sensible
behaviour you are now asking for.

> 
> > Can you, for example, enforce the permissions set up by the user?
> > I mean, if they do the ibv_reg_mr() on a private readonly area,
> > so __get_user_pages with the FOLL_APPROPRIATELY flag will fault
> > in ZERO_PAGEs, can you enforce that RDMA will never spray data
> > into those pages?
> 
> Yes, the access flags passed into ibv_reg_mr() are enforced by
> the RDMA hardware, so if no write access is request, no write
> access is possible.

Okay, if you enforce the agreed permissions in hardware, that's fine.

> 
> And presumably if we do GUP with write==1, force==0 that will
> fail on a read-only mapping?

That's right.

Hugh

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]