On Sat, Nov 14, 2020 at 12:10 PM Jonathan Marek <jonathan@xxxxxxxx> wrote: > > On 11/14/20 2:39 PM, Rob Clark wrote: > > On Sat, Nov 14, 2020 at 10:58 AM Jonathan Marek <jonathan@xxxxxxxx> wrote: > >> > >> On 11/14/20 1:46 PM, Rob Clark wrote: > >>> On Sat, Nov 14, 2020 at 8:24 AM Christoph Hellwig <hch@xxxxxx> wrote: > >>>> > >>>> On Sat, Nov 14, 2020 at 10:17:12AM -0500, Jonathan Marek wrote: > >>>>> +void msm_gem_sync_cache(struct drm_gem_object *obj, uint32_t flags, > >>>>> + size_t range_start, size_t range_end) > >>>>> +{ > >>>>> + struct msm_gem_object *msm_obj = to_msm_bo(obj); > >>>>> + struct device *dev = msm_obj->base.dev->dev; > >>>>> + > >>>>> + /* exit early if get_pages() hasn't been called yet */ > >>>>> + if (!msm_obj->pages) > >>>>> + return; > >>>>> + > >>>>> + /* TODO: sync only the specified range */ > >>>>> + > >>>>> + if (flags & MSM_GEM_SYNC_FOR_DEVICE) { > >>>>> + dma_sync_sg_for_device(dev, msm_obj->sgt->sgl, > >>>>> + msm_obj->sgt->nents, DMA_TO_DEVICE); > >>>>> + } > >>>>> + > >>>>> + if (flags & MSM_GEM_SYNC_FOR_CPU) { > >>>>> + dma_sync_sg_for_cpu(dev, msm_obj->sgt->sgl, > >>>>> + msm_obj->sgt->nents, DMA_FROM_DEVICE); > >>>>> + } > >>>> > >>>> Splitting this helper from the only caller is rather strange, epecially > >>>> with the two unused arguments. And I think the way this is specified > >>>> to take a range, but ignoring it is actively dangerous. User space will > >>>> rely on it syncing everything sooner or later and then you are stuck. > >>>> So just define a sync all primitive for now, and if you really need a > >>>> range sync and have actually implemented it add a new ioctl for that. > >>> > >>> We do already have a split of ioctl "layer" which enforces valid ioctl > >>> params, etc, and gem (or other) module code which is called by the > >>> ioctl func. So I think it is fine to keep this split here. (Also, I > >>> think at some point there will be a uring type of ioctl alternative > >>> which would re-use the same gem func.) > >>> > >>> But I do agree that the range should be respected or added later.. > >>> drm_ioctl() dispatch is well prepared for extending ioctls. > >>> > >>> And I assume there should be some validation that the range is aligned > >>> to cache-line? Or can we flush a partial cache line? > >>> > >> > >> The range is intended to be "sync at least this range", so that > >> userspace doesn't have to worry about details like that. > >> > > > > I don't think userspace can *not* worry about details like that. > > Consider a case where the cpu and gpu are simultaneously accessing > > different parts of a buffer (for ex, sub-allocation). There needs to > > be cache-line separation between the two. > > > > Right.. and it also seems like we can't get away with just > flushing/invalidating the whole thing. > > qcom's vulkan driver has nonCoherentAtomSize=1, and it looks like > dma_sync_single_for_cpu() does deal in some way with the partial cache > line case, although I'm not sure that means we can have a > nonCoherentAtomSize=1. > flush/inv the whole thing could be a useful first step, or at least I can think of some uses for it. But if it isn't useful for how vk sees the world, then maybe we should just implement the range properly from the get-go. (And I *think* requiring the range to be aligned to cacheline boundaries.. it is always easy from a kernel uabi PoV to loosen restrictions later, than the other way around.) BR, -R