On 22/07/2019 10:50, Robin Murphy wrote: > On 19/07/2019 23:07, Rob Herring wrote: >> On Fri, Jul 19, 2019 at 4:39 AM Steven Price <steven.price@xxxxxxx> >> wrote: >>> >>> On 18/07/2019 18:03, Rob Herring wrote: >>>> On Thu, Jul 18, 2019 at 9:03 AM Steven Price <steven.price@xxxxxxx> >>>> wrote: >>>>> >>>>> On 17/07/2019 19:33, Rob Herring wrote: >>>>>> Executable buffers have an alignment restriction that they can't >>>>>> cross >>>>>> 16MB boundary as the GPU program counter is 24-bits. This >>>>>> restriction is >>>>>> currently not handled and we just get lucky. As current userspace >>>>> >>>>> Actually it depends which GPU you are using - some do have a bigger >>>>> program counter - so it might be the choice of GPU to test on rather >>>>> than just luck. But kbase always assumes the worst case (24 bit) as in >>>>> practise that's enough. >>>>> >>>>>> assumes all BOs are executable, that has to remain the default. So >>>>>> add a >>>>>> new PANFROST_BO_NOEXEC flag to allow userspace to indicate which >>>>>> BOs are >>>>>> not executable. >>>>>> >>>>>> There is also a restriction that executable buffers cannot start >>>>>> or end >>>>>> on a 4GB boundary. This is mostly avoided as there is only 4GB of >>>>>> space >>>>>> currently and the beginning is already blocked out for NULL ptr >>>>>> detection. Add support to handle this restriction fully regardless of >>>>>> the current constraints. >>>>>> >>>>>> For existing userspace, all created BOs remain executable, but the >>>>>> GPU >>>>>> VA alignment will be increased to the size of the BO. This shouldn't >>>>>> matter as there is plenty of GPU VA space. >>>>>> >>>>>> Cc: Tomeu Vizoso <tomeu.vizoso@xxxxxxxxxxxxx> >>>>>> Cc: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> >>>>>> Cc: Robin Murphy <robin.murphy@xxxxxxx> >>>>>> Cc: Steven Price <steven.price@xxxxxxx> >>>>>> Cc: Alyssa Rosenzweig <alyssa@xxxxxxxxxxxxx> >>>>>> Signed-off-by: Rob Herring <robh@xxxxxxxxxx> >>>>>> --- >>>>>> drivers/gpu/drm/panfrost/panfrost_drv.c | 20 +++++++++++++++++++- >>>>>> drivers/gpu/drm/panfrost/panfrost_gem.c | 18 ++++++++++++++++-- >>>>>> drivers/gpu/drm/panfrost/panfrost_gem.h | 3 ++- >>>>>> drivers/gpu/drm/panfrost/panfrost_mmu.c | 3 +++ >>>>>> include/uapi/drm/panfrost_drm.h | 2 ++ >>>>>> 5 files changed, 42 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c >>>>>> b/drivers/gpu/drm/panfrost/panfrost_drv.c >>>>>> index d354b92964d5..b91e991bc6a3 100644 >>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c >>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c >>>>>> @@ -49,7 +49,8 @@ static int panfrost_ioctl_create_bo(struct >>>>>> drm_device *dev, void *data, >>>>>> struct panfrost_gem_object *bo; >>>>>> struct drm_panfrost_create_bo *args = data; >>>>>> >>>>>> - if (!args->size || args->flags || args->pad) >>>>>> + if (!args->size || args->pad || >>>>>> + (args->flags & ~PANFROST_BO_NOEXEC)) >>>>>> return -EINVAL; >>>>>> >>>>>> bo = panfrost_gem_create_with_handle(file, dev, args->size, >>>>>> args->flags, >>>>>> @@ -367,6 +368,22 @@ static struct drm_driver panfrost_drm_driver = { >>>>>> .gem_prime_mmap = drm_gem_prime_mmap, >>>>>> }; >>>>>> >>>>>> +#define PFN_4G_MASK ((SZ_4G - 1) >> PAGE_SHIFT) >>>>>> + >>>>>> +static void panfrost_drm_mm_color_adjust(const struct drm_mm_node >>>>>> *node, >>>>>> + unsigned long color, >>>>>> + u64 *start, u64 *end) >>>>>> +{ >>>>>> + /* Executable buffers can't start or end on a 4GB boundary */ >>>>>> + if (!color) { >>>>>> + if ((*start & PFN_4G_MASK) == 0) >>>>>> + (*start)++; >>>>>> + >>>>>> + if ((*end & PFN_4G_MASK) == 0) >>>>>> + (*end)--; >>>>>> + } >>>>>> +} >>>>> >>>>> Unless I'm mistaken this won't actually provide the guarantee if the >>>>> memory region is >4GB (which admittedly it isn't at the moment). For >>>>> example a 8GB region would have the beginning/end trimmed off, but >>>>> there's another 4GB in the middle which could be allocated next to. >>>> >>>> Humm, other than not allowing sizes greater than 4G-(2*PAGE_SIZE), I'm >>>> not sure how we solve that. I guess avoiding sizes of (n*4G - >>>> PAGE_SIZE) would avoid the problem. Seems like almost 4GB executable >>>> buffer should be enough for anyone(TM). >>> >>> I was thinking of something like: >>> >>> if ((*end & ~PFN_4G_MASK) != (*start & ~PFN_4G_MASK)) { >>> if ((*start & PFN_4G_MASK) + >>> (SZ_16M >> PAGE_SHIFT) < PFN_4G_MASK) >>> *end = (*start & ~PFN_4G_MASK) + >>> (SZ_4GB - SZ_16M) >> PAGE_SHIFT; >>> else >>> *start = (*end & ~PFN_4G_MASK) + (SZ_16M) >> >>> PAGE_SHIFT; >>> } >>> >>> So split the region depending on where we can find a free 16MB region >>> which doesn't cross 4GB. >> >> Here's what I ended up with. It's slightly different in that the start >> and end don't get 16MB aligned. The code already takes care of the >> alignment which also is not necessarily 16MB, but 'align = size' as >> that's sufficient to not cross a 16MB boundary. >> >> #define PFN_4G (SZ_4G >> PAGE_SHIFT) >> #define PFN_4G_MASK (PFN_4G - 1) >> #define PFN_16M (SZ_16M >> PAGE_SHIFT) >> >> static void panfrost_drm_mm_color_adjust(const struct drm_mm_node *node, >> unsigned long color, >> u64 *start, u64 *end) >> { >> /* Executable buffers can't start or end on a 4GB boundary */ >> if (!(color & PANFROST_BO_NOEXEC)) { >> if ((*start & PFN_4G_MASK) == 0) >> (*start)++; >> >> if ((*end & PFN_4G_MASK) == 0) >> (*end)--; >> >> /* Ensure start and end are in the same 4GB range */ >> if ((*end & ~PFN_4G_MASK) != (*start & ~PFN_4G_MASK)) { >> if ((*start & PFN_4G_MASK) + PFN_16M < >> PFN_4G_MASK) >> *end = (*start & ~PFN_4G_MASK) + >> PFN_4G - 1; >> else >> *start = (*end & ~PFN_4G_MASK) + 1; >> >> } >> } >> } > > If you're happy with the additional restriction that a buffer can't > cross a 4GB boundary (which does seem to be required for certain kinds > of buffers anyway), then I don't think it needs to be anywhere near that > complex: > > if (!(*start & PFN_4G_MASK)) > *start++ > *end = min(*end, ALIGN(*start, PFN_4G) - 1); What happens if *start is very near the end of a 4GB boundary? In that case *end - *start might not be as big as 16MB and we could end up failing the allocation IIUC. >>> >>> But like you say: 4GB should be enough for anyone ;) >>> >>>>> Also a minor issue, but we might want to consider having some >>>>> constants >>>>> for 'color' - it's not obvious from this code that color==no_exec. >>>> >>>> Yeah, I was just going worry about that when we had a second bit to >>>> pass. >>> >>> One other 'minor' issue I noticed as I was writing the above. PAGE_SIZE >>> is of course the CPU page size - we could have the situation of CPU and >>> GPU page size being different (e.g. CONFIG_ARM64_64K_PAGES). I'm not >>> sure whether we want to support this or not (kbase doesn't). Also in >>> theory some GPUs do support 64K (AS_TRANSCFG_ADRMODE_AARCH64_64K) - but >>> kbase has never used this so I don't know if it works... ;) >> >> Shhh! I have thought about that, but was ignoring for now. > > In general, I don't think that should be too great a concern - we > already handle it for IOMMUs, provided the minimum IOMMU page size is no > larger than PAGE_SIZE, which should always be the case here too. Looking > at how panfrost_mmu_map() is implemented, and given that we're already > trying to handle stuff at 2MB granularity where possible, I don't see > any obvious reasons for 64K pages not to work already (assuming there > aren't any 4K assumptions baked into the userspace side). I might have > to give it a try... I'd be interested to know if it works. I know that kbase incorrectly uses PAGE_SIZE in several places (in particular assuming it is the size of the page tables). And I was aware I was in danger of slipping into the same mindset here. User space shouldn't care too much - other than the size of buffers allocated being rounded up to the CPU's page size. At least the Panfrost user/kernel ABI has sizes in bytes not pages (unlike kbase). Steve _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel