On 27/06/2019 17:38, Rob Herring wrote: > On Thu, Jun 27, 2019 at 4:57 AM Steven Price <steven.price@xxxxxxx> wrote: >> >> Sorry for the slow response, I've been on holiday for a few weeks. > > Welcome back. Thanks! >> >> On 20/06/2019 06:50, Tomeu Vizoso wrote: >>> On Mon, 17 Jun 2019 at 16:56, Rob Herring <robh@xxxxxxxxxx> wrote: >>>> >>>> On Sun, Jun 16, 2019 at 11:15 PM Tomeu Vizoso >>>> <tomeu.vizoso@xxxxxxxxxxxxx> wrote: >>>>> >>>>> On Fri, 14 Jun 2019 at 23:22, Rob Herring <robh@xxxxxxxxxx> wrote: >>>>>> >>>>>> On Wed, Jun 12, 2019 at 6:55 AM Tomeu Vizoso <tomeu@xxxxxxxxxxxxxxx> wrote: >>>>>>> >>>>>>> On Mon, 10 Jun 2019 at 19:06, Rob Herring <robh@xxxxxxxxxx> wrote: >>>>>>>> >>>>>>>> The midgard/bifrost GPUs need to allocate GPU memory which is allocated >>>>>>>> on GPU page faults and not pinned in memory. The vendor driver calls >>>>>>>> this functionality GROW_ON_GPF. >>>>>>>> >>>>>>>> This implementation assumes that BOs allocated with the >>>>>>>> PANFROST_BO_NOMAP flag are never mmapped or exported. Both of those may >>>>>>>> actually work, but I'm unsure if there's some interaction there. It >>>>>>>> would cause the whole object to be pinned in memory which would defeat >>>>>>>> the point of this. >> >> Although in normal usage user space will never care about the contents >> of growable memory it can be useful to be able to access it for >> debugging (although not critical to have it working immediately). In >> particular it allow submitting the jobs in a job chain separately. >> Exporting I can't see a use-case for. >> >> So personally I'd prefer not using a "NOMAP" flag to mean "grow on fault". > > NOMAP means 'no gpu map on alloc'. The CPU mapping part is just a > limitation in the implementation which could be handled if needed. Ah, well my confusion might be another indication it's not a great name ;) > NOPIN? It's not really 'growing' either as the total/max size is > fixed. No sure if that's the same for kbase. Maybe faults happen to be > sequential in addresses and it grows in that sense. It depends what you understand by pinning. To me pinning means that the memory cannot be swapped out - which isn't the API level feature (e.g. we might introduce support for swapping when the GPU isn't using the memory). In kbase we call it "growing" because the amount of memory allocated can increase - and indeed it grows in a similar way to a stack on a CPU. > Maybe just saying what the buffer is used for (HEAP) would be best? That seems like a good name to me. User space doesn't really care how the kernel manages the memory - it just wants to communicate that this is temporary heap memory for the GPU to use. > Speaking of alloc flags, Alyssa also mentioned we need a way to align > shader buffers. My suggestion there is an executable flag. That way we > can also set pages to XN. Though maybe alignment requirements need to > be explicit? kbase mostly handles this with a executable flag, so yes that seems a reasonable way of handling it. Note, however, that there are a bunch of wacky optimisation ideas that have been considered that require particular alignment constraints. In particular kbase ended up with BASE_MEM_TILER_ALIGN_TOP[1] which is somewhat of a hack to specify the odd alignment requirement without adding extra fields to the ioctl. [1] https://gitlab.freedesktop.org/panfrost/mali_kbase/blob/master/driver/product/kernel/drivers/gpu/arm/midgard/mali_base_kernel.h#L197 One other thing that I don't think is well supported in panfrost at the moment is that some units don't actually store the full VA address. The most notable one is the PC - this is either 32 bit or 24 bit depending on the GPU (although kbase always assumes 24 bit). This means that the shader code must be aligned to not cross a 24 bit boundary. kbase also has BASE_MEM_GPU_VA_SAME_4GB_PAGE for the same idea but restricted to a 32 bit size. There's also a nasty limitation for executable memory - it can't start (or end) on a 4GB boundary, see the code here which avoids picking those addresses: https://gitlab.freedesktop.org/panfrost/mali_kbase/blob/master/driver/product/kernel/drivers/gpu/arm/midgard/mali_kbase_mem.c#L279 Finally kbase has kbase_ioctl_mem_alias which allows creating aliases of existing mappings with appropriate strides between them. This is an optimisation for rending to multiple render targets efficiently and is only needed for some GPUs. But I think we can leave that one for now. [...] >> It would certainly seem reasonable that the contents of NOMAP memory can >> be thrown away when the job chain has been completed. But, there is a >> potential performance improvement by not immediately unmapping/freeing >> the memory but leaving it in the assumption a similar job will be >> submitted later requiring roughly the same amount of memory. >> >> Arm's blob/kernel have various mechanisms for freeing memory either >> after a period of being idle (in the blob) or when a shrinker is called >> (in kbase). The idea is that the heap memory is grown once to whatever >> the content needs and then the same buffer (or small set of buffers) is >> reused repeatedly. kbase has a mechanism called "ephemeral memory" (or >> evictable) which is memory which normally remains mapped on the GPU, but >> under memory pressure it can be freed (and later faulted in with empty >> pages if accessed again). A pinning mechanism is used to ensure that >> this doesn't happen in the middle of a job chain which uses the buffer. >> This mechanism is referred to as "JIT" (Just In Time allocation) in places. > > That's a bit simpler than what I assumed JIT was. Ah, well I have simplified it a bit in that description :) There are effectively two features. Ephemeral memory is the DONT_NEED flag which enables the memory to be freed under memory pressure when it's not in use. JIT is then built on top of that and provides a mechanism for the kernel to allocated ephemeral memory regions "just in time" immediately before the jobs are sent to the GPU. This offloads the decision about how many memory regions are needed to the kernel in the hope that the kernel can dynamically choose the trade-off between allocating lots of buffers (gives maximum flexibility in terms of job scheduling) or saving memory by immediately running the fragment job so the heap buffers can be recycled. All I can say is that it's a locking nightmare (shrinkers can be called some very annoying contexts). It's also not clear that the kernel is in a better position to make the memory/performance trade-off decision than user space. > So there's 2 different cases of memory not pinned on alloc. The first > is the heap memory which is just faulted on demand (i.e during jobs) > and the 2nd is the JIT which is pinned some time between alloc and a > job submit. Is that correct? Is that 2 different allocation flags or 1 > flag with 2 different ways to get pages pinned? Yes that's correct. "Heap memory"[2] is just GROW_ON_GPF memory allocated by user space. JIT memory is allocated by a 'soft-job' (BASE_JD_REQ_SOFT_JIT_ALLOC) that user space inserts before the real GPU jobs. This soft-job is responsible for allocating (or reusing) a buffer (which is internally marked as GROW_ON_GPF) and ensuring it's pinned (removing any DONT_NEED flag). After the GPU jobs have run there's another soft-job (BASE_JD_REQ_SOFT_JIT_FREE) which will return the buffer to a pool and set the DONT_NEED flag on it. [2] We don't really have a term for this internally, it's just "growable memory". So both types are "grow on fault", the difference is that the user-allocated "heap memory" well not be discarded or automatically reused by the kernel, whereas JIT memory will be under the control of the kernel after the soft-job frees it and so can be recycled/freed at any time. >>> I could very well be missing something that is needed by Arm's blob >>> and not by Panfrost atm, but I don't see in kbase any mechanism for >>> the kernel to know when the GPU is done with a page, other than the >>> job that mapped it having finished. >> >> Much of the memory management is done by the user space blob. The kernel >> driver usually doesn't actually know what memory a job will access. >> There are exceptions though, in particular: ephemeral memory (through >> JIT) and imported memory. > > Presumably that's a difference. We have a complete list of BOs for each job. Yes - that's something that I've repeatedly wished the blob driver had. However it was an early design decision that the driver wouldn't need to track what memory regions would be used. This meant for the exceptions there has to be explicit tracking of the regions, which unfortunately means imported memory ends up being quite 'special'. Steve > Rob > _______________________________________________ > dri-devel mailing list > dri-devel@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/dri-devel > _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel