On 24/07/14 00:46, Bridgman, John wrote: > >> -----Original Message----- From: dri-devel >> [mailto:dri-devel-bounces@xxxxxxxxxxxxxxxxxxxxx] On Behalf Of Jesse >> Barnes Sent: Wednesday, July 23, 2014 5:00 PM To: >> dri-devel@xxxxxxxxxxxxxxxxxxxxx Subject: Re: [PATCH v2 00/25] >> AMDKFD kernel driver >> >> On Mon, 21 Jul 2014 19:05:46 +0200 daniel at ffwll.ch (Daniel >> Vetter) wrote: >> >>> On Mon, Jul 21, 2014 at 11:58:52AM -0400, Jerome Glisse wrote: >>>> On Mon, Jul 21, 2014 at 05:25:11PM +0200, Daniel Vetter wrote: >>>>> On Mon, Jul 21, 2014 at 03:39:09PM +0200, Christian K?nig >>>>> wrote: >>>>>> Am 21.07.2014 14:36, schrieb Oded Gabbay: >>>>>>> On 20/07/14 20:46, Jerome Glisse wrote: >> >> [snip!!] > My BlackBerry thumb thanks you ;) >> >>>>>> >>>>>> The main questions here are if it's avoid able to pin down >>>>>> the memory and if the memory is pinned down at driver load, >>>>>> by request from userspace or by anything else. >>>>>> >>>>>> As far as I can see only the "mqd per userspace queue" >>>>>> might be a bit questionable, everything else sounds >>>>>> reasonable. >>>>> >>>>> Aside, i915 perspective again (i.e. how we solved this): >>>>> When scheduling away from contexts we unpin them and put them >>>>> into the lru. And in the shrinker we have a last-ditch >>>>> callback to switch to a default context (since you can't ever >>>>> have no context once you've started) which means we can evict >>>>> any context object if it's >> getting in the way. >>>> >>>> So Intel hardware report through some interrupt or some channel >>>> when it is not using a context ? ie kernel side get >>>> notification when some user context is done executing ? >>> >>> Yes, as long as we do the scheduling with the cpu we get >>> interrupts for context switches. The mechanic is already >>> published in the execlist patches currently floating around. We >>> get a special context switch interrupt. >>> >>> But we have this unpin logic already on the current code where >>> we switch contexts through in-line cs commands from the kernel. >>> There we obviously use the normal batch completion events. >> >> Yeah and we can continue that going forward. And of course if your >> hw can do page faulting, you don't need to pin the normal data >> buffers. >> >> Usually there are some special buffers that need to be pinned for >> longer periods though, anytime the context could be active. Sounds >> like in this case the userland queues, which makes some sense. But >> maybe for smaller systems the size limit could be clamped to >> something smaller than 128M. Or tie it into the rlimit somehow, >> just like we do for mlock() stuff. >> > Yeah, even the queues are in pageable memory, it's just a ~256 byte > structure per queue (the Memory Queue Descriptor) that describes the > queue to hardware, plus a couple of pages for each process using HSA > to hold things like doorbells. Current thinking is to limit # > processes using HSA to ~256 and #queues per process to ~1024 by > default in the initial code, although my guess is that we could take > the #queues per process default limit even lower. > So my mistake. struct cik_mqd is actually 604 bytes, and it is allocated on 256 boundary. I had in mind to reserve 64MB of gart by default, which translates to 512 queues per process, with 128 processes. Add 2 kernel module parameters, # of max-queues-per-process and # of max-processes (default is, as I said, 512 and 128) for better control of system admin. Oded >>>> The issue with radeon hardware AFAICT is that the hardware do >>>> not report any thing about the userspace context running ie you >>>> do not get notification when a context is not use. Well AFAICT. >>>> Maybe hardware >> do provide that. >>> >>> I'm not sure whether we can do the same trick with the hw >>> scheduler. But then unpinning hw contexts will drain the pipeline >>> anyway, so I guess we can just stop feeding the hw scheduler >>> until it runs dry. And then unpin and evict. >> >> Yeah we should have an idea which contexts have been fed to the >> scheduler, at least with kernel based submission. With userspace >> submission we'll be in a tougher spot... but as you say we can >> always idle things and unpin everything under pressure. That's a >> really big hammer to apply though. >> >>>> Like the VMID is a limited resources so you have to dynamicly >>>> bind them so maybe we can only allocate pinned buffer for each >>>> VMID and then when binding a PASID to a VMID it also copy back >>>> pinned buffer to >> pasid unpinned copy. >>> >>> Yeah, pasid assignment will be fun. Not sure whether Jesse's >>> patches will do this already. We _do_ already have fun with ctx >>> id assigments though since we move them around (and the hw id is >>> the ggtt address afaik). So we need to remap them already. Not >>> sure on the details for pasid mapping, iirc it's a separate field >>> somewhere in the context struct. Jesse knows the details. >> >> The PASID space is a bit bigger, 20 bits iirc. So we probably >> won't run out quickly or often. But when we do I thought we could >> apply the same trick Linux uses for ASID management on SPARC and >> ia64 (iirc on sparc anyway, maybe MIPS too): "allocate" a PASID >> everytime you need one, but don't tie it to the process at all, >> just use it as a counter that lets you know when you need to do a >> full TLB flush, then start the allocation process over. This lets >> you minimize TLB flushing and gracefully handles oversubscription. > > IIRC we have a 9-bit limit for PASID on current hardware, although > that will go up in future. >> >> My current code doesn't bother though; context creation will fail >> if we run out of PASIDs on a given device. >> >> -- Jesse Barnes, Intel Open Source Technology Center >> _______________________________________________ dri-devel mailing >> list dri-devel@xxxxxxxxxxxxxxxxxxxxx >> http://lists.freedesktop.org/mailman/listinfo/dri-devel > _______________________________________________ dri-devel mailing > list dri-devel@xxxxxxxxxxxxxxxxxxxxx > http://lists.freedesktop.org/mailman/listinfo/dri-devel > _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel