Hi Yi, >-----Original Message----- >From: Liu, Yi L <yi.l.liu@xxxxxxxxx> >Sent: Thursday, December 12, 2024 5:29 PM >Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in >cache_tag_flush_range_np() > >On 2024/12/12 16:19, Duan, Zhenzhong wrote: >> >> >>> -----Original Message----- >>> From: Liu, Yi L <yi.l.liu@xxxxxxxxx> >>> Sent: Thursday, December 12, 2024 3:45 PM >>> Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in >>> cache_tag_flush_range_np() >>> >>> On 2024/12/12 15:24, Zhenzhong Duan wrote: >>>> When setup mapping on an paging domain before the domain is attached to >>> any >>>> device, a NULL pointer dereference triggers as below: >>>> >>>> BUG: kernel NULL pointer dereference, address: 0000000000000200 >>>> #PF: supervisor read access in kernel mode >>>> #PF: error_code(0x0000) - not-present page >>>> RIP: 0010:cache_tag_flush_range_np+0x114/0x1f0 >>>> ... >>>> Call Trace: >>>> <TASK> >>>> ? __die+0x20/0x70 >>>> ? page_fault_oops+0x80/0x150 >>>> ? do_user_addr_fault+0x5f/0x670 >>>> ? pfn_to_dma_pte+0xca/0x280 >>>> ? exc_page_fault+0x78/0x170 >>>> ? asm_exc_page_fault+0x22/0x30 >>>> ? cache_tag_flush_range_np+0x114/0x1f0 >>>> intel_iommu_iotlb_sync_map+0x16/0x20 >>>> iommu_map+0x59/0xd0 >>>> batch_to_domain+0x154/0x1e0 >>>> iopt_area_fill_domains+0x106/0x300 >>>> iopt_map_pages+0x1bc/0x290 >>>> iopt_map_user_pages+0xe8/0x1e0 >>>> ? xas_load+0x9/0xb0 >>>> iommufd_ioas_map+0xc9/0x1c0 >>>> iommufd_fops_ioctl+0xff/0x1b0 >>>> __x64_sys_ioctl+0x87/0xc0 >>>> do_syscall_64+0x50/0x110 >>>> entry_SYSCALL_64_after_hwframe+0x76/0x7e >>>> >>>> qi_batch structure is allocated when domain is attached to a device for the >>>> first time, when a mapping is setup before that, qi_batch is referenced to >>>> do batched flush and trigger above issue. >>>> >>>> Fix it by checking qi_batch pointer and bypass batched flushing if no >>>> device is attached. >>>> >>>> Cc: stable@xxxxxxxxxxxxxxx >>>> Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache invalidation") >>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@xxxxxxxxx> >>>> --- >>>> drivers/iommu/intel/cache.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c >>>> index e5b89f728ad3..bb9dae9a7fba 100644 >>>> --- a/drivers/iommu/intel/cache.c >>>> +++ b/drivers/iommu/intel/cache.c >>>> @@ -264,7 +264,7 @@ static unsigned long >>> calculate_psi_aligned_address(unsigned long start, >>>> >>>> static void qi_batch_flush_descs(struct intel_iommu *iommu, struct >qi_batch >>> *batch) >>>> { >>>> - if (!iommu || !batch->index) >>>> + if (!iommu || !batch || !batch->index) >>> >>> this is the same issue in the below link. :) And we should fix it by >>> allocating the qi_batch for parent domains. The nested parent domains is >>> not going to be attached to device at all. It acts more as a background >>> domain, so this fix will miss the future cache flushes. It would have >>> bigger problems. :) >>> >>> https://lore.kernel.org/linux-iommu/20241210130322.17175-1- >>> yi.l.liu@xxxxxxxxx/ >> >> Ah, just see this😊 >> This patch tries to fix an issue that mapping setup on a paging domain before >> it's attached to a device, your patch fixed an issue that nested parent >> domain's qi_batch is not allocated even if nested domain is attached to >> a device. I think they are different issues? > >Oops. I see. I think your case is allocating a hwpt based on an IOAS that >already has mappings. When the hwpt is added to it, all the mappings of >this IOAS would be pushing to the hwpt. But the hwpt has not been attached >yet, so hit this issue. I remember there is a immediate_attach bool to let >iommufd_hwpt_paging_alloc() do an attach before calling >iopt_table_add_domain() which pushes the IOAS mappings to domain. > >One possible fix is to set the immediate_attach to be true. But I doubt if >it will be agreed since it was introduced due to some gap on ARM side. If >that gap has been resolved, this behavior would go away as well. > >So back to this issue, with this change, the flush would be skipped. It >looks ok to me to skip cache flush for map path. And we should not expect >any unmap on this domain since there is no device attached (parent domain >is an exception), hence nothing to be flushed even there is unmap in the >domain's IOAS. So it appears to be a acceptable fix. @Baolu, your opinion? Hold on, it looks I'm wrong on analyzing related code qi_batch_flush_descs(). The iommu should always be NULL in my suspected case, so qi_batch_flush_descs() will return early without issue. I reproduced the backtrace when playing with iommufd qemu nesting, I think your previous comment is correct, I misunderstood the root cause of it, it's indeed due to nesting parent domain not paging domain. Please ignore this patch. Thanks Zhenzhong