On 04/09/2023 17:25, Robin Murphy wrote: > On 2023-09-04 17:16, Boris Brezillon wrote: >> On Mon, 4 Sep 2023 16:22:19 +0100 >> Steven Price <steven.price@xxxxxxx> wrote: >> >>> On 04/09/2023 10:26, Boris Brezillon wrote: >>>> On Mon, 4 Sep 2023 08:42:08 +0100 >>>> Steven Price <steven.price@xxxxxxx> wrote: >>>> >>>>> On 01/09/2023 17:10, Boris Brezillon wrote: >>>>>> On Wed, 9 Aug 2023 18:53:15 +0200 >>>>>> Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> wrote: >>>>>> >>>>>>> +/** >>>>>>> + * DOC: MMIO regions exposed to userspace. >>>>>>> + * >>>>>>> + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET >>>>>>> + * >>>>>>> + * File offset for all MMIO regions being exposed to userspace. >>>>>>> Don't use >>>>>>> + * this value directly, use DRM_PANTHOR_USER_<name>_OFFSET >>>>>>> values instead. >>>>>>> + * >>>>>>> + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET >>>>>>> + * >>>>>>> + * File offset for the LATEST_FLUSH_ID register. The Userspace >>>>>>> driver controls >>>>>>> + * GPU cache flushling through CS instructions, but the flush >>>>>>> reduction >>>>>>> + * mechanism requires a flush_id. This flush_id could be queried >>>>>>> with an >>>>>>> + * ioctl, but Arm provides a well-isolated register page >>>>>>> containing only this >>>>>>> + * read-only register, so let's expose this page through a >>>>>>> static mmap offset >>>>>>> + * and allow direct mapping of this MMIO region so we can avoid the >>>>>>> + * user <-> kernel round-trip. >>>>>>> + */ >>>>>>> +#define DRM_PANTHOR_USER_MMIO_OFFSET (0x1ull << 56) >>>>>> >>>>>> I'm playing with a 32-bit kernel/userspace, and this is problematic, >>>>>> because vm_pgoff is limited to 32-bit there, meaning we can only >>>>>> map up >>>>>> to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let >>>>>> userspace set the mmio range? >>>>> >>>>> Hmm, I was rather hoping we could ignore 32 bit these days ;) But >>>>> while >>>>> I can't see why anyone would be running a 32 bit kernel, I guess 32 >>>>> bit >>>>> user space is likely to still be needed. >>>> >>>> Uh, I just hit a new problem with 32-bit kernels: the io-pgtable >>>> interface (io_pgtable_ops) passes device VAs as unsigned longs, meaning >>>> the GPU VA space is limited to 4G on a 32-bit build :-(. Robin, any >>>> chance you could advise me on what to do here? >>>> >>>> 1. assume this limitation is here for a good reason, and limit the GPU >>>> VA space to 32-bits on 32-bit kernels >>>> >>>> or >>>> >>>> 2. update the interface to make iova an u64 >>> >>> I'm not sure I can answer the question from a technical perspective, >>> hopefully Robin will be able to. >> >> Had a quick chat with Robin, and he's recommending going for #1 too. >> >>> >>> But why do we care about 32-bit kernels on a platform which is new >>> enough to have a CSF-GPU (and by extension a recent 64-bit CPU)? >> >> Apparently the memory you save by switching to a 32-bit kernel matters >> to some people. To clarify, the CPU is aarch64, but they want to use it >> in 32-bit mode. >> >>> >>> Given the other limitations present in a 32-bit kernel I'd be tempted to >>> say '1' just for simplicity. Especially since apparently we've lived >>> with this for panfrost which presumably has the same limitation (even >>> though all Bifrost/Midgard GPUs have at least 33 bits of VA space). >> >> Well, Panfrost is simpler in that you don't have this kernel VA range, >> and, IIRC, we are using the old format that naturally limits the GPU VA >> space to 4G. > > FWIW the legacy pagetable format itself should be fine going up to > however many bits the GPU supports, however there were various ISA > limitations around crossing 4GB boundaries, and the easiest way to avoid > having to think about those was to just not use more than 4GB of VA at > all (minus chunks at the ends for similar weird ISA reasons). Exactly right. The legacy pagetable format supports the full range of VA. Indeed kbase used the legacy format for a long time. However the ISA has special handling for addresses where bits 31:12 are all 0 or all 1, so we have to avoid executable code landing on these regions. kbase has a modified version of 'unmapped_area_topdown'[1] to handle these additional restrictions. Steve [1] kbase_unmapped_area_topdown() https://android.googlesource.com/kernel/google-modules/gpu/+/refs/tags/android-12.0.0_r0.42/mali_kbase/thirdparty/mali_kbase_mmap.c#97