On 18 July 2017 at 03:02, Christian König <deathsimple at vodafone.de> wrote: > Am 17.07.2017 um 05:36 schrieb Dave Airlie: >>> >>> I can take a look at it, I just won't have time until next week most >>> likely. >> >> I've taken a look, and it's seemingly more complicated than I'm >> expecting I'd want to land in Mesa before 17.2 ships, I'd really >> prefer to just push the new libdrm_amdgpu api from this patch. If I >> have to port all the current radv code to the new API, I'll most >> definitely get something wrong. >> >> Adding the new API so far looks like >> https://cgit.freedesktop.org/~airlied/drm/log/?h=drm-amdgpu-cs-submit-raw >> >> >> https://cgit.freedesktop.org/~airlied/drm/commit/?h=drm-amdgpu-cs-submit-raw&id=e7f85d0ca617fa41e72624780c9035df132e23c4 >> being the API, and whether it should take a uint32_t context id or >> context handle left as an open question in the last patch in the >> series. > > > I would stick with the context handle, as far as I can see there isn't any > value in using the uint32_t for this. > > We just want to be able to send arbitrary chunks down into the kernel > without libdrm_amdgpu involvement and/or the associated overhead of the > extra loop and the semaphore handling. > > So your "amdgpu/cs: add new raw cs submission interface just taking chunks" > patch looks fine to me as far as I can tell. > > As far as I can see the "amdgpu: refactor semaphore handling" patch is > actually incorrect. We must hole the mutex while sending the CS down to the > kernel, or otherwise "context->last_seq" won't be accurate. > >> However to hook this into radv or radeonsi will take a bit of >> rewriting of a lot of code that is probably a bit more fragile than >> I'd like for this sort of surgery at this point. > > > Again, I can move over the existing Mesa stuff if you like. > >> I'd actually suspect if we do want to proceed with this type of >> interface, we might be better doing it all in common mesa code, and >> maybe bypassing libdrm_amdgpu altogether, which I suppose the API I've >> written here is mostly already doing. > > > I want to stick with the other interfaces for now. No need to make it more > complicated than it already is. > > Only the CS stuff is the most performance critical and thing we have right > now. As I suspected this plan is full of traps. So with the raw cs api I posted (using amdgpu_bo_list_handle instead), I ran into two places the abstraction cuts me. CC winsys/amdgpu/radv_amdgpu_cs.lo winsys/amdgpu/radv_amdgpu_cs.c: In function â??radv_amdgpu_cs_submitâ??: winsys/amdgpu/radv_amdgpu_cs.c:1173:63: error: dereferencing pointer to incomplete type â??struct amdgpu_boâ?? chunk_data[i].fence_data.handle = request->fence_info.handle->handle; ^~ winsys/amdgpu/radv_amdgpu_cs.c:1193:31: error: dereferencing pointer to incomplete type â??struct amdgpu_contextâ?? dep->ctx_id = info->context->id; In order to do user fence chunk I need the actual bo handle not the amdgpu wrapped one, we don't have an accessor method for that. In order to do the dependencies chunks, I need a context id. Now I suppose I can add chunk creation helpers to libdrm, but it does seems like it breaks the future proof interface if we can't access the details of a bunch of objects we want to pass through to the kernel API. Dave.