Re: [BUG REPORT]net: page_pool: kernel crash at iommu_get_dma_domain+0xc/0x20

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/08/2024 10:51 am, Jesper Dangaard Brouer wrote:
[...]
The iommu_group will release whether the page_pool is using it or not, so if once page_pool_return_page() was called(why does this occur when the device is reloaded and packets are transmitted?) , this crash will happen.

I try the follow patch, but doesn't work :(


The idea of taking a refcnt on IOMMU to avoid dev->iommu_group getting
freed, make sense to me.

The question is if API iommu_group_get() and iommu_group_put() is the
correct API to use in this case?

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index f4444b4e39e6..d03a87407ca8 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -21,6 +21,7 @@
  #include <linux/poison.h>
  #include <linux/ethtool.h>
  #include <linux/netdevice.h>
+#include <linux/iommu.h>


The page_pool already have a system/workqueue that waits for inflight
"packet" pages, and calls struct device API get_device() and put_device().

Why didn't the patch add code together with struct device API?
Like this:

Now do the one where there is no IOMMU, and dma_unmap_page() corrupts random unrelated memory because the mapped DMA address was relative to dev->dma_range_map which has since become NULL.

In other words, no, hacking one particular IOMMU API symptom does not solve the fundamental lifecycle problem that you have here.

Thanks,
Robin.


$ git diff
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 2abe6e919224..686ff1d31aff 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -265,8 +265,10 @@ static int page_pool_init(struct page_pool *pool,
        /* Driver calling page_pool_create() also call page_pool_destroy() */
         refcount_set(&pool->user_cnt, 1);

-       if (pool->dma_map)
+       if (pool->dma_map) {
+               iommu_group_get(pool->p.dev);
                 get_device(pool->p.dev);
+       }

         return 0;
  }
@@ -275,8 +277,10 @@ static void page_pool_uninit(struct page_pool *pool)
  {
         ptr_ring_cleanup(&pool->ring, NULL);

-       if (pool->dma_map)
+       if (pool->dma_map) {
+               iommu_group_put(pool->p.dev->iommu_group);
                 put_device(pool->p.dev);
+       }


--Jesper

  #include <trace/events/page_pool.h>
 > @@ -306,6 +307,9 @@ page_pool_create_percpu(const struct
page_pool_params *params, int cpuid)
         if (err)
                 goto err_uninit;

+       if (pool->dma_map)
+               iommu_group_get(pool->p.dev);
+
         return pool;

  err_uninit:
@@ -974,8 +978,11 @@ static int page_pool_release(struct page_pool *pool)

         page_pool_scrub(pool);
         inflight = page_pool_inflight(pool, true);
-       if (!inflight)
+       if (!inflight) {
                 __page_pool_destroy(pool);
+               if (pool->dma_map)
+ iommu_group_put(pool->p.dev->iommu_group);
+       }

         return inflight;
  }


The page_pool bumps refcnt via get_device() + put_device() on the DMA
'struct device', to avoid it going away, but I guess there is also some IOMMU code that we need to make sure doesn't go away (until all inflight
pages are returned) ???


[ 4407.212119] process_one_work+0x164/0x3e0
[ 4407.216116]  worker_thread+0x310/0x420
[ 4407.219851]  kthread+0x120/0x130
[ 4407.223066]  ret_from_fork+0x10/0x20
[ 4407.226630] Code: ffffc318 aa1e03e9 d503201f f9416c00 (f9405400)
[ 4407.232697] ---[ end trace 0000000000000000 ]---


The hns3 driver use page pool like this, just call once when the driver
initialize:

static void hns3_alloc_page_pool(struct hns3_enet_ring *ring)
{
      struct page_pool_params pp_params = {
          .flags = PP_FLAG_DMA_MAP | PP_FLAG_PAGE_FRAG |
                  PP_FLAG_DMA_SYNC_DEV,
          .order = hns3_page_order(ring),
          .pool_size = ring->desc_num * hns3_buf_size(ring) /
                  (PAGE_SIZE << hns3_page_order(ring)),
          .nid = dev_to_node(ring_to_dev(ring)),
          .dev = ring_to_dev(ring),
          .dma_dir = DMA_FROM_DEVICE,
          .offset = 0,
          .max_len = PAGE_SIZE << hns3_page_order(ring),
      };

      ring->page_pool = page_pool_create(&pp_params);
      if (IS_ERR(ring->page_pool)) {
          dev_warn(ring_to_dev(ring), "page pool creation failed: %ld\n",
               PTR_ERR(ring->page_pool));
          ring->page_pool = NULL;
      }
}

And call page_pool_destroy(ring->page_pool)  when the driver uninitialized.


We use two devices, the net port connect directory, and the step of the
test case like below:

1. enable a vf of '7d:00.0':  echo 1 >
/sys/class/net/eno1/device/sriov_numvfs

2. use iperf to produce some flows(the problem happens to the side which
runs 'iperf -s')

3. use ifconfig down/up to the vf

4. kill iperf

5. disable the vf: echo 0 > /sys/class/net/eno1/device/sriov_numvfs

6. run 1~5 with another port bd:00.0

7. repeat 1~6


And when running this test case, we can found another related message (I
replaced pr_warn() to dev_warn()):

pci 0000:7d:01.0: page_pool_release_retry() stalled pool shutdown: id
949, 98 inflight 1449 sec


Even when stop the traffic, stop the test case, disable the vf, this
message is still being printed.

We must run the test case for about two hours to reproduce the problem.
Is there some advise to solve or debug the problem?







[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux