On 09/12/ , Felix Kuehling wrote: > On 2023-09-11 22:52, Lang Yu wrote: > > On 09/11/ , Harish Kasiviswanathan wrote: > > > Heavy-weight TLB flush is required after unmap on all GPUs for > > > correctness and security. > > > > > > Signed-off-by: Harish Kasiviswanathan<Harish.Kasiviswanathan@xxxxxxx> > > > --- > > > drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 3 +-- > > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > > index b315311dfe2a..b9950074aee0 100644 > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > > @@ -1466,8 +1466,7 @@ void kfd_flush_tlb(struct kfd_process_device *pdd, enum TLB_FLUSH_TYPE type); > > > static inline bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) > > > { > > > - return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 3) || > > > - KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || > > > + return KFD_GC_VERSION(dev) > IP_VERSION(9, 4, 2) || > > > (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) && dev->sdma_fw_version >= 18) || > > > KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 0); > > > } > > 1, If TLB_FLUSH_HEAVYWEIGHT is required after unmap on all GPUs > > as described in commmit message, why we have this whitelist > > instead of a blacklist? > > That was a bug that this patch is fixing. There were specific GPUs and > firmware versions where the TLB flush after unmap was causing intermittent > problems in specific tests. This should have always been a blacklist. > > > > > > 2, kfd_flush_tlb(pdd, TLB_FLUSH_HEAVYWEIGHT) is also called > > in svm_range_unmap_from_gpus(). Why not add this whitelist there? > > There was a patch that used kfd_flush_tlb_after_unmap in the SVM code. But > you reverted that patch, probably because it caused more problems than it > solved. SVM really must flush TLBs the way it does because it is so tightly > integrated with Linux's virtual memory management and because with XNACK, > memory can be unmapped while GPU work is in progress without preemting > queues (implicitly flushing TLBs and caches): > > commit 515d7cebc2e2d2b4f0a276d26f3b790a83cdfe06 > Author: Lang Yu<Lang.Yu@xxxxxxx> > Date: Wed Apr 20 10:24:31 2022 +0800 > > Revert "drm/amdkfd: only allow heavy-weight TLB flush on some ASICs for SVM too" > This reverts commit 36bf93216ecbe399c40c5e0486f0f0e3a4afa69e. > It causes SVM regressions on Vega10 with XNACK-ON. Just revert it > at the moment. > ./kfdtest --gtest_filter=KFDSVMRangeTest.MigratePolicyTest > Signed-off-by: Lang Yu<Lang.Yu@xxxxxxx> > Reviewed-by: Philip Yang<Philip.Yang@xxxxxxx> > Signed-off-by: Alex Deucher<alexander.deucher@xxxxxxx> > > Regards, > Felix Yes, that's because kfd_flush_tlb_after_unmap() return false for Vega10(gfx901). kfd_flush_tlb(pdd, TLB_FLUSH_HEAVYWEIGHT) is called unconditionally in SVM for ASICs > IP_VERSION(9, 0, 0) and works well. So why not relax the condition to KFD_GC_VERSION(dev) > IP_VERSION(9, 0, 0) ? Regards, Lang > > > > > Regards, > > Lang > > > > > -- > > > 2.34.1 > > >