Re: [PATCH net-next 3/3] page_pool: Track DMA-mapped pages and unmap them when destroying the pool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Solar Designer <solar@xxxxxxxxxxxx> writes:

> On Wed, Mar 19, 2025 at 07:06:57PM +0800, Yunsheng Lin wrote:
>> On 2025/3/19 4:55, Toke Høiland-Jørgensen wrote:
>> > Yunsheng Lin <linyunsheng@xxxxxxxxxx> writes:
>> >> On 2025/3/17 23:16, Toke Høiland-Jørgensen wrote:
>> >>> Yunsheng Lin <yunshenglin0825@xxxxxxxxx> writes:
>> >>>> On 3/14/2025 6:10 PM, Toke Høiland-Jørgensen wrote:
>> >>>>
>> >>>> ...
>> >>>>
>> >>>>> To avoid having to walk the entire xarray on unmap to find the page
>> >>>>> reference, we stash the ID assigned by xa_alloc() into the page
>> >>>>> structure itself, using the upper bits of the pp_magic field. This
>> >>>>> requires a couple of defines to avoid conflicting with the
>> >>>>> POINTER_POISON_DELTA define, but this is all evaluated at compile-time,
>> >>>>> so does not affect run-time performance. The bitmap calculations in this
>> >>>>> patch gives the following number of bits for different architectures:
>> >>>>>
>> >>>>> - 24 bits on 32-bit architectures
>> >>>>> - 21 bits on PPC64 (because of the definition of ILLEGAL_POINTER_VALUE)
>> >>>>> - 32 bits on other 64-bit architectures
>> >>>>
>> >>>>  From commit c07aea3ef4d4 ("mm: add a signature in struct page"):
>> >>>> "The page->signature field is aliased to page->lru.next and
>> >>>> page->compound_head, but it can't be set by mistake because the
>> >>>> signature value is a bad pointer, and can't trigger a false positive
>> >>>> in PageTail() because the last bit is 0."
>> >>>>
>> >>>> And commit 8a5e5e02fc83 ("include/linux/poison.h: fix LIST_POISON{1,2} 
>> >>>> offset"):
>> >>>> "Poison pointer values should be small enough to find a room in
>> >>>> non-mmap'able/hardly-mmap'able space."
>> >>>>
>> >>>> So the question seems to be:
>> >>>> 1. Is stashing the ID causing page->pp_magic to be in the mmap'able/
>> >>>>     easier-mmap'able space? If yes, how can we make sure this will not
>> >>>>     cause any security problem?
>> >>>> 2. Is the masking the page->pp_magic causing a valid pionter for
>> >>>>     page->lru.next or page->compound_head to be treated as a vaild
>> >>>>     PP_SIGNATURE? which might cause page_pool to recycle a page not
>> >>>>     allocated via page_pool.
>> >>>
>> >>> Right, so my reasoning for why the defines in this patch works for this
>> >>> is as follows: in both cases we need to make sure that the ID stashed in
>> >>> that field never looks like a valid kernel pointer. For 64-bit arches
>> >>> (where CONFIG_ILLEGAL_POINTER_VALUE), we make sure of this by never
>> >>> writing to any bits that overlap with the illegal value (so that the
>> >>> PP_SIGNATURE written to the field keeps it as an illegal pointer value).
>> >>> For 32-bit arches, we make sure of this by making sure the top-most bit
>> >>> is always 0 (the -1 in the define for _PP_DMA_INDEX_BITS) in the patch,
>> >>> which puts it outside the range used for kernel pointers (AFAICT).
>> >>
>> >> Is there any season you think only kernel pointer is relevant here?
>> > 
>> > Yes. Any pointer stored in the same space as pp_magic by other users of
>> > the page will be kernel pointers (as they come from page->lru.next). The
>> > goal of PP_SIGNATURE is to be able to distinguish pages allocated by
>> > page_pool, so we don't accidentally recycle a page from somewhere else.
>> > That's the goal of the check in page_pool_page_is_pp():
>> > 
>> > (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE
>> > 
>> > To achieve this, we must ensure that the check above never returns true
>> > for any value another page user could have written into the same field
>> > (i.e., into page->lru.next). For 64-bit arches, POISON_POINTER_DELTA
>> 
>> POISON_POINTER_DELTA is defined according to CONFIG_ILLEGAL_POINTER_VALUE,
>> if CONFIG_ILLEGAL_POINTER_VALUE is not defined yet, POISON_POINTER_DELTA
>> is defined to zero.
>> 
>> It seems only the below 64-bit arches define CONFIG_ILLEGAL_POINTER_VALUE
>> through grepping:
>> a29815a333c6 core, x86: make LIST_POISON less deadly
>> 5c178472af24 riscv: define ILLEGAL_POINTER_VALUE for 64bit
>> f6853eb561fb powerpc/64: Define ILLEGAL_POINTER_VALUE for 64-bit
>> bf0c4e047324 arm64: kconfig: Move LIST_POISON to a safe value
>> 
>> The below 64-bit arches don't seems to define the above config yet:
>> MIPS64, SPARC64, System z(S390X),loongarch
>> 
>> Does ID stashing cause problem for the above arches?
>> 
>> > serves this purpose. For 32-bit arches, we can leave the top-most bits
>> > out of PP_MAGIC_MASK, to make sure that any valid pointer value will
>> > fail the check above.
>> 
>> The above mainly explained how to ensure page_pool_page_is_pp() will
>> not return false positive result from the page_pool perspective.
>> 
>> From MM/security perspective, most of the commits quoted above seem
>> to suggest that poison pointer should be in the non-mmap'able or
>> hardly-mmap'able space, otherwise userspace can arrange for those
>> pointers to actually be dereferencable, potentially turning an oops
>> to an expolit, more detailed example in the below paper, which explains
>> how to exploit a vulnerability which hardened by the 8a5e5e02fc83 commit:
>> https://www.usenix.org/system/files/conference/woot15/woot15-paper-xu.pdf
>> 
>> ID stashing seems to cause page->lru.next (aliased to page->pp_magic) to
>> be in the mmap'able space for some arches.
>
> ...
>
>> To be honest, I am not that familiar with the pointer poison mechanism.
>> But through some researching and analyzing, it makes sense to overstate
>> it a little as it seems to be security-related.
>> Cc'ed some security-related experts and ML to see if there is some
>> clarifying from them.
>
> You're correct that the pointer poison values should be in areas not
> mmap'able by userspace (at least with reasonable mmap_min_addr values).
>
> Looking at the union inside "struct page", I see pp_magic is aliased
> against multiple pointers in the union'ed anonymous structs.
>
> I'm not familiar with the uses of page->pp_magic and how likely or not
> we are to have a bug where its aliasing with pointers would be exposed
> as an attack vector, but this does look like a serious security concern.
> It looks like we would be seriously weakening the poisoning, except on
> archs where the new values with ID stashing are still not mmap'able.
>
> I just discussed the matter with my colleague at CIQ, Sultan Alsawaf,
> and he thinks the added risk is not that bad.  He wrote:
>
>> Toke's response here is fair:
>> 
>> > Right, okay, I see what you mean. So the risk is basically the
>> > following:
>> > 
>> > If some other part of the kernel ends up dereferencing the
>> > page->lru.next pointer of a page that is owned by page_pool, and which
>> > has an ID stashed into page->pp_magic, that dereference can end up being
>> > to a valid userspace mapping, which can lead to Bad Things(tm), cf the
>> > paper above.
>> > 
>> > This is mitigated by the fact that it can only happen on architectures
>> > that don't set ILLEGAL_POINTER_VALUE (which includes 32-bit arches, and
>> > the ones you listed above). In addition, this has to happen while the
>> > page is owned by page_pool, and while it is DMA-mapped - we already
>> > clear the pp_magic field when releasing the page from page_pool.
>> > 
>> > I am not sure to what extent the above is a risk we should take pains to
>> > avoid, TBH. It seems to me that for this to become a real problem, lots
>> > of other things will already have gone wrong. But happy to defer to the
>> > mm/security folks here.
>> 
>> For this to be a problem, there already needs to be a use-after-free on
>> a page, which arguably creates many other vectors for attack.
>> 
>> The lru field of struct page is already used as a generic list pointer
>> in several places in the kernel once ownership of the page is obtained.
>> Any risk of dereferencing lru.next in a use-after-free scenario would
>> technically apply to a bunch of other places in the kernel (grep for
>> page->lru).
>
> We also tried searching for existing exploitation techniques for "struct
> page" use-after-free.  We couldn't find any.  The closest (non-)match I
> found is this fine research (the same project presented differently):
>
> https://i.blackhat.com/BH-US-24/Presentations/US24-Qian-PageJack-A-Powerful-Exploit-Technique-With-Page-Level-UAF-Thursday.pdf page 33+
> https://arxiv.org/html/2401.17618v2#S4
> https://phrack.org/issues/71/13
>
> The arxiv paper includes this sentence: "To create a page-level UAF, the
> key is to cause a UAF of the struct page objects."  However, we do not
> see them actually do that, and this statement is not found in the slides
> nor in the Phrack article.  Confused.

Thank you for weighing in! I will post an updated version of this patch
with a reference to this discussion and try to summarise the above.

> Thank you for CC'ing me and the kernel-hardening list.  However, please
> do not CC the oss-security list like that, where it's against content
> guidelines.  Only properly focused new postings/threads are acceptable
> there (not CC'ing from/to other lists where only part of the content is
> on-topic, and follow-ups might not be on-topic at all).  See:
>
> https://oss-security.openwall.org/wiki/mailing-lists/oss-security#list-content-guidelines
>
> As a moderator for oss-security, I'm going to remove these messages from
> the queue now.  Please drop Cc: oss-security from any further replies.
>
> If desired, we may bring these topics to oss-security separately, with a
> proper Subject line and clear description of what we're talking about.

Noted, thanks!

-Toke






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux