On 2024/12/12 19:00, Yi Liu wrote:
On 2024/12/12 18:01, Duan, Zhenzhong wrote:
Hi Yi,
-----Original Message-----
From: Liu, Yi L <yi.l.liu@xxxxxxxxx>
Sent: Thursday, December 12, 2024 5:29 PM
Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in
cache_tag_flush_range_np()
On 2024/12/12 16:19, Duan, Zhenzhong wrote:
-----Original Message-----
From: Liu, Yi L <yi.l.liu@xxxxxxxxx>
Sent: Thursday, December 12, 2024 3:45 PM
Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer
dereference in
cache_tag_flush_range_np()
On 2024/12/12 15:24, Zhenzhong Duan wrote:
When setup mapping on an paging domain before the domain is
attached to
any
device, a NULL pointer dereference triggers as below:
BUG: kernel NULL pointer dereference, address: 0000000000000200
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
RIP: 0010:cache_tag_flush_range_np+0x114/0x1f0
...
Call Trace:
<TASK>
? __die+0x20/0x70
? page_fault_oops+0x80/0x150
? do_user_addr_fault+0x5f/0x670
? pfn_to_dma_pte+0xca/0x280
? exc_page_fault+0x78/0x170
? asm_exc_page_fault+0x22/0x30
? cache_tag_flush_range_np+0x114/0x1f0
intel_iommu_iotlb_sync_map+0x16/0x20
iommu_map+0x59/0xd0
batch_to_domain+0x154/0x1e0
iopt_area_fill_domains+0x106/0x300
iopt_map_pages+0x1bc/0x290
iopt_map_user_pages+0xe8/0x1e0
? xas_load+0x9/0xb0
iommufd_ioas_map+0xc9/0x1c0
iommufd_fops_ioctl+0xff/0x1b0
__x64_sys_ioctl+0x87/0xc0
do_syscall_64+0x50/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
qi_batch structure is allocated when domain is attached to a
device for the
first time, when a mapping is setup before that, qi_batch is
referenced to
do batched flush and trigger above issue.
Fix it by checking qi_batch pointer and bypass batched flushing if no
device is attached.
Cc: stable@xxxxxxxxxxxxxxx
Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache
invalidation")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@xxxxxxxxx>
---
drivers/iommu/intel/cache.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/
cache.c
index e5b89f728ad3..bb9dae9a7fba 100644
--- a/drivers/iommu/intel/cache.c
+++ b/drivers/iommu/intel/cache.c
@@ -264,7 +264,7 @@ static unsigned long
calculate_psi_aligned_address(unsigned long start,
static void qi_batch_flush_descs(struct intel_iommu *iommu,
struct
qi_batch
*batch)
{
- if (!iommu || !batch->index)
+ if (!iommu || !batch || !batch->index)
this is the same issue in the below link. 🙂 And we should fix it by
allocating the qi_batch for parent domains. The nested parent
domains is
not going to be attached to device at all. It acts more as a
background
domain, so this fix will miss the future cache flushes. It would have
bigger problems. 🙂
https://lore.kernel.org/linux-iommu/20241210130322.17175-1-
yi.l.liu@xxxxxxxxx/
Ah, just see this😊
This patch tries to fix an issue that mapping setup on a paging
domain before
it's attached to a device, your patch fixed an issue that nested parent
domain's qi_batch is not allocated even if nested domain is attached to
a device. I think they are different issues?
Oops. I see. I think your case is allocating a hwpt based on an IOAS
that
already has mappings. When the hwpt is added to it, all the mappings of
this IOAS would be pushing to the hwpt. But the hwpt has not been
attached
yet, so hit this issue. I remember there is a immediate_attach bool
to let
iommufd_hwpt_paging_alloc() do an attach before calling
iopt_table_add_domain() which pushes the IOAS mappings to domain.
One possible fix is to set the immediate_attach to be true. But I
doubt if
it will be agreed since it was introduced due to some gap on ARM
side. If
that gap has been resolved, this behavior would go away as well.
So back to this issue, with this change, the flush would be skipped. It
looks ok to me to skip cache flush for map path. And we should not
expect
any unmap on this domain since there is no device attached (parent
domain
is an exception), hence nothing to be flushed even there is unmap in the
domain's IOAS. So it appears to be a acceptable fix. @Baolu, your
opinion?
Hold on, it looks I'm wrong on analyzing related code
qi_batch_flush_descs().
The iommu should always be NULL in my suspected case, so
qi_batch_flush_descs() will return early without issue.
I reproduced the backtrace when playing with iommufd qemu nesting, I
think your
previous comment is correct, I misunderstood the root cause of it,
it's indeed
due to nesting parent domain not paging domain. Please ignore this patch.
Great. I also had a try to allocate hwpt with an IOAS that has already got
a bunch of mappings, it can work as the iommu is null.
@Baolu, you may ignore this patch as it's already fixed.
Okay, thank you!