Re: [PATCH RFC v4 3/3] page_pool: skip dma sync operation for inflight pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024/11/21 21:44, Robin Murphy wrote:
> On 21/11/2024 8:04 am, Yunsheng Lin wrote:
>> On 2024/11/21 0:17, Robin Murphy wrote:
>>> On 20/11/2024 10:34 am, Yunsheng Lin wrote:
>>>> Skip dma sync operation for inflight pages before the
>>>> page_pool_destroy() returns to the driver as DMA API
>>>> expects to be called with a valid device bound to a
>>>> driver as mentioned in [1].
>>>>
>>>> After page_pool_destroy() is called, the page is not
>>>> expected to be recycled back to pool->alloc cache and
>>>> dma sync operation is not needed when the page is not
>>>> recyclable or pool->ring is full, so only skip the dma
>>>> sync operation for the infilght pages by clearing the
>>>> pool->dma_sync under protection of rcu lock when page
>>>> is recycled to pool->ring to ensure that there is no
>>>> dma sync operation called after page_pool_destroy() is
>>>> returned.
>>>
>>> Something feels off here - either this is a micro-optimisation which I wouldn't really expect to be meaningful, or it means patch #2 doesn't actually do what it claims. If it really is possible to attempt to dma_sync a page *after* page_pool_inflight_unmap() has already reclaimed and unmapped it, that represents yet another DMA API lifecycle issue, which as well as being even more obviously incorrect usage-wise, could also still lead to the same crash (if the device is non-coherent).
>>
>> For a page_pool owned page, it mostly goes through the below steps:
>> 1. page_pool calls buddy allocator API to allocate a page, call DMA mapping
>>     and sync_for_device API for it if its pool is empty. Or reuse the page in
>>     pool.
>>
>> 2. Driver calls the page_pool API to allocate the page, and pass the page
>>     to network stack after packet is dma'ed into the page and the sync_for_cpu
>>     API is called.
>>
>> 3. Network stack is done with page and called page_pool API to free the page.
>>
>> 4. page_pool releases the page back to buddy allocator if the page is not
>>     recyclable before doing the dma unmaping. Or do the sync_for_device
>>     and put the page in the its pool, the page might go through step 1
>>     again if the driver calls the page_pool allocate API.
>>
>> The calling of dma mapping and dma sync API is controlled by pool->dma_map
>> and pool->dma_sync respectively, the previous patch only clear pool->dma_map
>> after doing the dma unmapping. This patch ensures that there is no dma_sync
>> for recycle case of step 4 by clearing pool->dma_sync.
> 
> But *why* does it want to ensure that? Is there some possible race where one thread can attempt to sync and recycle a page while another thread is attempting to unmap and free it, such that you can't guarantee the correctness of dma_sync calls after page_pool_inflight_unmap() has started, and skipping them is a workaround for that? If so, then frankly I think that would want solving properly, but at the very least this change would need to come before patch #2.

The racing window is something like below. page_pool_destroy() and
page_pool_put_page() can be called concurrently, patch 2 only use
a spinlock to synchronise page_pool_inflight_unmap() with
page_pool_return_page() called by page_pool_put_page() to avoid
concurrent dma unmapping, there is no synchronization between
page_pool_destroy() and page_pool_dma_sync_for_device() called
by page_pool_put_page():
            CPU0                           CPU1
             .                               .
     page_pool_destroy()          page_pool_put_page()
             .                               .
     synchronize_rcu()                       .
             .                               .
   page_pool_inflight_unmap()                .
             .                               .
             .                    __page_pool_put_page()
             .                               .
             .               page_pool_dma_sync_for_device()
             .                               .

After this patch, page_pool_dma_sync_for_device() is protected by
rcu lock and pool->dma_sync is cleared before synchronize_rcu and
page_pool_inflight_unmap() is called after synchronize_rcu to ensure
page_pool_dma_sync_for_device() will not call dma sync API after
synchronize_rcu():

            CPU0                           CPU1
             .                               .
     page_pool_destroy() CPU       page_pool_put_page() CPU
             .                               .
    pool->dma_sync = false                   .
             .                               .
     synchronize_rcu()                       .
             .                               .
   page_pool_inflight_unmap()                .
             .                               .
             .                  page_pool_recycle_in_ring()
             .                               .
             .                        rcu_read_lock()
             .               page_pool_dma_sync_for_device()
             .                        rcu_read_unlock()

Previously patch 2&3 was combined as one patch, this version splits
it out to make it more reviewable.
I am not sure if it matters that much about the patch order as the
fix doesn't seem to be completed unless both patches are included.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux