On 24/08/2021 18:30, Alyssa Rosenzweig wrote: > When locking memory, the base address is rounded down to the nearest > page. The current code does not adjust the size in this case, > truncating the lock region: > > Input: [----size----] > Round: [----size----] > > To fix the truncation, extend the lock region by the amount rounded off. > > Input: [----size----] > Round: [-------size------] > > This bug is difficult to hit under current assumptions: since the size > of the lock region is stored as a ceil(log2), the truncation must cause > us to cross a power-of-two boundary. This is possible, for example if > the caller tries to lock 65535 bytes starting at iova 0xCAFE0010. The > existing code rounds down the iova to 0xCAFE0000 and rounds up the lock > region to 65536 bytes, locking until 0xCAFF0000. This fails to lock the > last 15 bytes. > > In practice, the current callers pass PAGE_SIZE aligned inputs, avoiding > the bug. Therefore this doesn't need to be backported. Still, that's a > happy accident and not a precondition of lock_region, so we let's do the > right thing to future proof. Actually it's worse than that due to the hardware behaviour, the spec states (for LOCKADDR_BASE): > Only the upper bits of the address are used. The address is aligned to a > multiple of the region size, so a variable number of low-order bits are > ignored, depending on the selected region size. It is recommended that software > ensures that these low bits in the address are cleared, to avoid confusion. It appears that indeed this has caused confusion ;) So for a simple request like locking from 0xCAFE0000 - 0xCB010000 (size = 0x30000) the region width gets rounded up (to 0x40000) which causes the start address to be effectively rounded down (by the hardware) to 0xCAFC0000 and we fail to lock 0xCB000000-0xCB010000. Interestingly (unless my reading of this is wrong) that means to lock 0xFFFF0000-0x100010000 (i.e. crossing the 4GB boundary) requires locking *at least* 0x00000000-0x200000000 (i.e. locking the first 8GB). This appears to be broken in kbase (which actually does zero out the low bits of the address) - I've raised a bug internally so hopefully someone will tell me if I've read the spec completely wrong here. Steve > Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@xxxxxxxxxxxxx> > Reported-by: Steven Price <steven.price@xxxxxxx> > --- > drivers/gpu/drm/panfrost/panfrost_mmu.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c > index dfe5f1d29763..14be32497ec3 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c > +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c > @@ -63,6 +63,9 @@ static void lock_region(struct panfrost_device *pfdev, u32 as_nr, > u8 region_width; > u64 region = iova & PAGE_MASK; > > + /* After rounding the address down, extend the size to lock the end. */ > + size += (region - iova); > + > /* The size is encoded as ceil(log2) minus(1), which may be calculated > * with fls. The size must be clamped to hardware bounds. > */ >