On Wed, Jun 23, 2021 at 4:02 PM Christian König <christian.koenig@xxxxxxx> wrote: > > Am 23.06.21 um 15:49 schrieb Daniel Vetter: > > On Wed, Jun 23, 2021 at 3:44 PM Christian König > > <christian.koenig@xxxxxxx> wrote: > >> Am 23.06.21 um 15:38 schrieb Bas Nieuwenhuizen: > >>> On Wed, Jun 23, 2021 at 2:59 PM Christian König > >>> <christian.koenig@xxxxxxx> wrote: > >>>> Am 23.06.21 um 14:18 schrieb Daniel Vetter: > >>>>> On Wed, Jun 23, 2021 at 11:45 AM Bas Nieuwenhuizen > >>>>> <bas@xxxxxxxxxxxxxxxxxxx> wrote: > >>>>>> On Tue, Jun 22, 2021 at 6:55 PM Daniel Vetter <daniel.vetter@xxxxxxxx> wrote: > >>>>>>> WARNING: Absolutely untested beyond "gcc isn't dying in agony". > >>>>>>> > >>>>>>> Implicit fencing done properly needs to treat the implicit fencing > >>>>>>> slots like a funny kind of IPC mailbox. In other words it needs to be > >>>>>>> explicitly. This is the only way it will mesh well with explicit > >>>>>>> fencing userspace like vk, and it's also the bare minimum required to > >>>>>>> be able to manage anything else that wants to use the same buffer on > >>>>>>> multiple engines in parallel, and still be able to share it through > >>>>>>> implicit sync. > >>>>>>> > >>>>>>> amdgpu completely lacks such an uapi. Fix this. > >>>>>>> > >>>>>>> Luckily the concept of ignoring implicit fences exists already, and > >>>>>>> takes care of all the complexities of making sure that non-optional > >>>>>>> fences (like bo moves) are not ignored. This support was added in > >>>>>>> > >>>>>>> commit 177ae09b5d699a5ebd1cafcee78889db968abf54 > >>>>>>> Author: Andres Rodriguez <andresx7@xxxxxxxxx> > >>>>>>> Date: Fri Sep 15 20:44:06 2017 -0400 > >>>>>>> > >>>>>>> drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2 > >>>>>>> > >>>>>>> Unfortuantely it's the wrong semantics, because it's a bo flag and > >>>>>>> disables implicit sync on an allocated buffer completely. > >>>>>>> > >>>>>>> We _do_ want implicit sync, but control it explicitly. For this we > >>>>>>> need a flag on the drm_file, so that a given userspace (like vulkan) > >>>>>>> can manage the implicit sync slots explicitly. The other side of the > >>>>>>> pipeline (compositor, other process or just different stage in a media > >>>>>>> pipeline in the same process) can then either do the same, or fully > >>>>>>> participate in the implicit sync as implemented by the kernel by > >>>>>>> default. > >>>>>>> > >>>>>>> By building on the existing flag for buffers we avoid any issues with > >>>>>>> opening up additional security concerns - anything this new flag here > >>>>>>> allows is already. > >>>>>>> > >>>>>>> All drivers which supports this concept of a userspace-specific > >>>>>>> opt-out of implicit sync have a flag in their CS ioctl, but in reality > >>>>>>> that turned out to be a bit too inflexible. See the discussion below, > >>>>>>> let's try to do a bit better for amdgpu. > >>>>>>> > >>>>>>> This alone only allows us to completely avoid any stalls due to > >>>>>>> implicit sync, it does not yet allow us to use implicit sync as a > >>>>>>> strange form of IPC for sync_file. > >>>>>>> > >>>>>>> For that we need two more pieces: > >>>>>>> > >>>>>>> - a way to get the current implicit sync fences out of a buffer. Could > >>>>>>> be done in a driver ioctl, but everyone needs this, and generally a > >>>>>>> dma-buf is involved anyway to establish the sharing. So an ioctl on > >>>>>>> the dma-buf makes a ton more sense: > >>>>>>> > >>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fdri-devel%2F20210520190007.534046-4-jason%40jlekstrand.net%2F&data=04%7C01%7Cchristian.koenig%40amd.com%7C83dbdd0a1eb8442cbf7108d9364db51e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637600529684040802%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=fbdwtutEj93anZp6Pshs277QoMTHZxIy0Yl54T95rCw%3D&reserved=0 > >>>>>>> > >>>>>>> Current drivers in upstream solves this by having the opt-out flag > >>>>>>> on their CS ioctl. This has the downside that very often the CS > >>>>>>> which must actually stall for the implicit fence is run a while > >>>>>>> after the implicit fence point was logically sampled per the api > >>>>>>> spec (vk passes an explicit syncobj around for that afaiui), and so > >>>>>>> results in oversync. Converting the implicit sync fences into a > >>>>>>> snap-shot sync_file is actually accurate. > >>>>>>> > >>>>>>> - Simillar we need to be able to set the exclusive implicit fence. > >>>>>>> Current drivers again do this with a CS ioctl flag, with again the > >>>>>>> same problems that the time the CS happens additional dependencies > >>>>>>> have been added. An explicit ioctl to only insert a sync_file (while > >>>>>>> respecting the rules for how exclusive and shared fence slots must > >>>>>>> be update in struct dma_resv) is much better. This is proposed here: > >>>>>>> > >>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fdri-devel%2F20210520190007.534046-5-jason%40jlekstrand.net%2F&data=04%7C01%7Cchristian.koenig%40amd.com%7C83dbdd0a1eb8442cbf7108d9364db51e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637600529684040802%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vv%2BREnWorjoTOwrD1jH1GHVQcjPy1oesaophsz056aI%3D&reserved=0 > >>>>>>> > >>>>>>> These three pieces together allow userspace to fully control implicit > >>>>>>> fencing and remove all unecessary stall points due to them. > >>>>>>> > >>>>>>> Well, as much as the implicit fencing model fundamentally allows: > >>>>>>> There is only one set of fences, you can only choose to sync against > >>>>>>> only writers (exclusive slot), or everyone. Hence suballocating > >>>>>>> multiple buffers or anything else like this is fundamentally not > >>>>>>> possible, and can only be fixed by a proper explicit fencing model. > >>>>>>> > >>>>>>> Aside from that caveat this model gets implicit fencing as closely to > >>>>>>> explicit fencing semantics as possible: > >>>>>>> > >>>>>>> On the actual implementation I opted for a simple setparam ioctl, no > >>>>>>> locking (just atomic reads/writes) for simplicity. There is a nice > >>>>>>> flag parameter in the VM ioctl which we could use, except: > >>>>>>> - it's not checked, so userspace likely passes garbage > >>>>>>> - there's already a comment that userspace _does_ pass garbage in the > >>>>>>> priority field > >>>>>>> So yeah unfortunately this flag parameter for setting vm flags is > >>>>>>> useless, and we need to hack up a new one. > >>>>>>> > >>>>>>> v2: Explain why a new SETPARAM (Jason) > >>>>>>> > >>>>>>> v3: Bas noticed I forgot to hook up the dependency-side shortcut. We > >>>>>>> need both, or this doesn't do much. > >>>>>>> > >>>>>>> v4: Rebase over the amdgpu patch to always set the implicit sync > >>>>>>> fences. > >>>>>> So I think there is still a case missing in this implementation. > >>>>>> Consider these 3 cases > >>>>>> > >>>>>> (format: a->b: b waits on a. Yes, I know arrows are hard) > >>>>>> > >>>>>> explicit->explicit: This doesn't wait now, which is good > >>>>>> Implicit->explicit: This doesn't wait now, which is good > >>>>>> explicit->implicit : This still waits as the explicit submission still > >>>>>> adds shared fences and most things that set an exclusive fence for > >>>>>> implicit sync will hence wait on it. > >>>>>> > >>>>>> This is probably good enough for what radv needs now but also sounds > >>>>>> like a risk wrt baking in new uapi behavior that we don't want to be > >>>>>> the end result. > >>>>>> > >>>>>> Within AMDGPU this is probably solvable in two ways: > >>>>>> > >>>>>> 1) Downgrade AMDGPU_SYNC_NE_OWNER to AMDGPU_SYNC_EXPLICIT for shared fences. > >>>>> I'm not sure that works. I think the right fix is that radeonsi also > >>>>> switches to this model, with maybe a per-bo CS flag to set indicate > >>>>> write access, to cut down on the number of ioctls that are needed > >>>>> otherwise on shared buffers. This per-bo flag would essentially select > >>>>> between SYNC_NE_OWNER and SYNC_EXPLICIT on a per-buffer basis. > >>>> Yeah, but I'm still not entirely sure why that approach isn't sufficient? > >>>> > >>>> Problem with the per context or per vm flag is that you then don't get > >>>> any implicit synchronization any more when another process starts using > >>>> the buffer. > >>> That is exactly what I want for Vulkan :) > >> Yeah, but as far as I know this is not something we can do. > >> > >> See we have use cases like screen capture and debug which rely on that > >> behavior. > > They will keep working, if (and only if) the vulkan side sets the > > winsys fences correctly. Also, everything else in vulkan aside from > > winsys is explicitly not synced at all, you have to import drm syncobj > > timeline on the gl side. > > > >> The only thing we can do is to say on a per buffer flag that a buffer > >> should not participate in implicit sync at all. > > Nah, this doesn't work. Because it's not a global decision, is a local > > decision for the rendered. Vulkan wants to control implicit sync > > explicitly, and the kernel can't force more synchronization. If a > > buffer is shared as a winsys buffer between vulkan client and gl using > > compositor, then you _have_ to use implicit sync on it. But vk needs > > to set the fences directly (and if the app gets it wrong, you get > > misrendering, but that is the specified behavour of vulkan). > > Yeah, but that's exactly what we tried to avoid. > > Mhm, when we attach the flag to the process/VM then this would break the > use case of VA-API and Vulkan in the same process. > > But I think if you attach the flag to the context that should indeed > work fine. Yeah that's a question I have, whether the drm_file is shared within one process among everything, or whether radeonsi/libva/vk each have their own. If each have their own drm_file, then we should be fine, otherwise we need to figure out another place to put this (worst case as a CS extension that vk just sets on every submit). Also yes this risks that a vk app which was violationing the winsys spec will now break, which is why I think we should do this sooner than later. Otherwise the list of w/a we might need to apply in vk userspace will become very long :-( At least since this is purely opt-in from userspace, we only need to have the w/a list in userspace, where mesa has the infrastructure for that already. -Daniel > > Christian. > > > -Daniel > > > >> Regards, > >> Christian. > >> > >>>>> The current amdgpu uapi just doesn't allow any other model without an > >>>>> explicit opt-in. So current implicit sync userspace just has to > >>>>> oversync, there's not much choice. > >>>>> > >>>>>> 2) Have an EXPLICIT fence owner that is used for explicit submissions > >>>>>> that is ignored by AMDGPU_SYNC_NE_OWNER. > >>>>>> > >>>>>> But this doesn't solve cross-driver interactions here. > >>>>> Yeah cross-driver is still entirely unsolved, because > >>>>> amdgpu_bo_explicit_sync() on the bo didn't solve that either. > >>>> Hui? You have lost me. Why is that still unsolved? > >>> The part we're trying to solve with this patch is Vulkan should not > >>> participate in any implicit sync at all wrt submissions (and then > >>> handle the implicit sync for WSI explicitly using the fence > >>> import/export stuff that Jason wrote). As long we add shared fences to > >>> the dma_resv we participate in implicit sync (at the level of an > >>> implicit sync read) still, at least from the perspective of later jobs > >>> waiting on these fences. > >>> > >>>> Regards, > >>>> Christian. > >>>> > >>>>> -Daniel > >>>>> > >>>>>>> Cc: mesa-dev@xxxxxxxxxxxxxxxxxxxxx > >>>>>>> Cc: Bas Nieuwenhuizen <bas@xxxxxxxxxxxxxxxxxxx> > >>>>>>> Cc: Dave Airlie <airlied@xxxxxxxxx> > >>>>>>> Cc: Rob Clark <robdclark@xxxxxxxxxxxx> > >>>>>>> Cc: Kristian H. Kristensen <hoegsberg@xxxxxxxxxx> > >>>>>>> Cc: Michel Dänzer <michel@xxxxxxxxxxx> > >>>>>>> Cc: Daniel Stone <daniels@xxxxxxxxxxxxx> > >>>>>>> Cc: Sumit Semwal <sumit.semwal@xxxxxxxxxx> > >>>>>>> Cc: "Christian König" <christian.koenig@xxxxxxx> > >>>>>>> Cc: Alex Deucher <alexander.deucher@xxxxxxx> > >>>>>>> Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > >>>>>>> Cc: Deepak R Varma <mh12gx2825@xxxxxxxxx> > >>>>>>> Cc: Chen Li <chenli@xxxxxxxxxxxxx> > >>>>>>> Cc: Kevin Wang <kevin1.wang@xxxxxxx> > >>>>>>> Cc: Dennis Li <Dennis.Li@xxxxxxx> > >>>>>>> Cc: Luben Tuikov <luben.tuikov@xxxxxxx> > >>>>>>> Cc: linaro-mm-sig@xxxxxxxxxxxxxxxx > >>>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx> > >>>>>>> --- > >>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 7 +++++-- > >>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 21 +++++++++++++++++++++ > >>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 6 ++++++ > >>>>>>> include/uapi/drm/amdgpu_drm.h | 10 ++++++++++ > >>>>>>> 4 files changed, 42 insertions(+), 2 deletions(-) > >>>>>>> > >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > >>>>>>> index 65df34c17264..c5386d13eb4a 100644 > >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > >>>>>>> @@ -498,6 +498,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, > >>>>>>> struct amdgpu_bo *gds; > >>>>>>> struct amdgpu_bo *gws; > >>>>>>> struct amdgpu_bo *oa; > >>>>>>> + bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); > >>>>>>> int r; > >>>>>>> > >>>>>>> INIT_LIST_HEAD(&p->validated); > >>>>>>> @@ -577,7 +578,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, > >>>>>>> > >>>>>>> e->bo_va = amdgpu_vm_bo_find(vm, bo); > >>>>>>> > >>>>>>> - if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) { > >>>>>>> + if (bo->tbo.base.dma_buf && > >>>>>>> + !(no_implicit_sync || amdgpu_bo_explicit_sync(bo))) { > >>>>>>> e->chain = dma_fence_chain_alloc(); > >>>>>>> if (!e->chain) { > >>>>>>> r = -ENOMEM; > >>>>>>> @@ -649,6 +651,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) > >>>>>>> { > >>>>>>> struct amdgpu_fpriv *fpriv = p->filp->driver_priv; > >>>>>>> struct amdgpu_bo_list_entry *e; > >>>>>>> + bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync); > >>>>>>> int r; > >>>>>>> > >>>>>>> list_for_each_entry(e, &p->validated, tv.head) { > >>>>>>> @@ -656,7 +659,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) > >>>>>>> struct dma_resv *resv = bo->tbo.base.resv; > >>>>>>> enum amdgpu_sync_mode sync_mode; > >>>>>>> > >>>>>>> - sync_mode = amdgpu_bo_explicit_sync(bo) ? > >>>>>>> + sync_mode = no_implicit_sync || amdgpu_bo_explicit_sync(bo) ? > >>>>>>> AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER; > >>>>>>> r = amdgpu_sync_resv(p->adev, &p->job->sync, resv, sync_mode, > >>>>>>> &fpriv->vm); > >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>>>>>> index c080ba15ae77..f982626b5328 100644 > >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>>>>>> @@ -1724,6 +1724,26 @@ int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv) > >>>>>>> return 0; > >>>>>>> } > >>>>>>> > >>>>>>> +int amdgpu_setparam_ioctl(struct drm_device *dev, void *data, > >>>>>>> + struct drm_file *filp) > >>>>>>> +{ > >>>>>>> + struct drm_amdgpu_setparam *setparam = data; > >>>>>>> + struct amdgpu_fpriv *fpriv = filp->driver_priv; > >>>>>>> + > >>>>>>> + switch (setparam->param) { > >>>>>>> + case AMDGPU_SETPARAM_NO_IMPLICIT_SYNC: > >>>>>>> + if (setparam->value) > >>>>>>> + WRITE_ONCE(fpriv->vm.no_implicit_sync, true); > >>>>>>> + else > >>>>>>> + WRITE_ONCE(fpriv->vm.no_implicit_sync, false); > >>>>>>> + break; > >>>>>>> + default: > >>>>>>> + return -EINVAL; > >>>>>>> + } > >>>>>>> + > >>>>>>> + return 0; > >>>>>>> +} > >>>>>>> + > >>>>>>> const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { > >>>>>>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_CREATE, amdgpu_gem_create_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > >>>>>>> DRM_IOCTL_DEF_DRV(AMDGPU_CTX, amdgpu_ctx_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > >>>>>>> @@ -1742,6 +1762,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = { > >>>>>>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > >>>>>>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > >>>>>>> DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > >>>>>>> + DRM_IOCTL_DEF_DRV(AMDGPU_SETPARAM, amdgpu_setparam_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), > >>>>>>> }; > >>>>>>> > >>>>>>> static const struct drm_driver amdgpu_kms_driver = { > >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h > >>>>>>> index ddb85a85cbba..0e8c440c6303 100644 > >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h > >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h > >>>>>>> @@ -321,6 +321,12 @@ struct amdgpu_vm { > >>>>>>> bool bulk_moveable; > >>>>>>> /* Flag to indicate if VM is used for compute */ > >>>>>>> bool is_compute_context; > >>>>>>> + /* > >>>>>>> + * Flag to indicate whether implicit sync should always be skipped on > >>>>>>> + * this context. We do not care about races at all, userspace is allowed > >>>>>>> + * to shoot itself with implicit sync to its fullest liking. > >>>>>>> + */ > >>>>>>> + bool no_implicit_sync; > >>>>>>> }; > >>>>>>> > >>>>>>> struct amdgpu_vm_manager { > >>>>>>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h > >>>>>>> index 0cbd1540aeac..9eae245c14d6 100644 > >>>>>>> --- a/include/uapi/drm/amdgpu_drm.h > >>>>>>> +++ b/include/uapi/drm/amdgpu_drm.h > >>>>>>> @@ -54,6 +54,7 @@ extern "C" { > >>>>>>> #define DRM_AMDGPU_VM 0x13 > >>>>>>> #define DRM_AMDGPU_FENCE_TO_HANDLE 0x14 > >>>>>>> #define DRM_AMDGPU_SCHED 0x15 > >>>>>>> +#define DRM_AMDGPU_SETPARAM 0x16 > >>>>>>> > >>>>>>> #define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create) > >>>>>>> #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap) > >>>>>>> @@ -71,6 +72,7 @@ extern "C" { > >>>>>>> #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm) > >>>>>>> #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle) > >>>>>>> #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched) > >>>>>>> +#define DRM_IOCTL_AMDGPU_SETPARAM DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SETPARAM, struct drm_amdgpu_setparam) > >>>>>>> > >>>>>>> /** > >>>>>>> * DOC: memory domains > >>>>>>> @@ -306,6 +308,14 @@ union drm_amdgpu_sched { > >>>>>>> struct drm_amdgpu_sched_in in; > >>>>>>> }; > >>>>>>> > >>>>>>> +#define AMDGPU_SETPARAM_NO_IMPLICIT_SYNC 1 > >>>>>>> + > >>>>>>> +struct drm_amdgpu_setparam { > >>>>>>> + /* AMDGPU_SETPARAM_* */ > >>>>>>> + __u32 param; > >>>>>>> + __u32 value; > >>>>>>> +}; > >>>>>>> + > >>>>>>> /* > >>>>>>> * This is not a reliable API and you should expect it to fail for any > >>>>>>> * number of reasons and have fallback path that do not use userptr to > >>>>>>> -- > >>>>>>> 2.32.0.rc2 > >>>>>>> > > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch