Hi all,
I do not have a way of reproducing this decent enough to recommend: I'll keep digging.
The page belongs to a slub when the fragment is being constructed in __skb_fill_page_desc(), see the instrumentation I used below.
When usercopy triggers, .coals shows values of 2/3 for 128/192 bytes respectively.
The question is how the RX sk_buff ends up having data fragment in a PageSlab page.
Some network drivers use netdev_alloc_frag() so pages indeed come from page_frag allocator.
Others (mellanox, intel) just alloc_page() when filling their RX descriptors.
In both cases the pages will be refcounted properly.
I suspect my kernel TCP traffic that uses kernel_sendpage() for bio pages AND slub pages.
Thanks a lot!
Anton
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a098d95..7cd744c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -40,6 +40,7 @@
#include <linux/in6.h>
#include <linux/if_packet.h>
#include <net/flow.h>
+#include <linux/slub_def.h>
/* The interface for checksum offload between the stack and networking drivers
* is as follows...
@@ -316,7 +317,8 @@ struct skb_frag_struct {
} page;
#if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536)
__u32 page_offset;
- __u32 size;
+ __u16 size;
+ __u16 coals;
#else
__u16 page_offset;
__u16 size;
@@ -1850,9 +1852,11 @@ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i,
*/
frag->page.p = page;
frag->page_offset = off;
+ frag->coals = 0;
skb_frag_size_set(frag, size);
page = compound_head(page);
+ WARN_ON(PageSlab(page) && (page->slab_cache->size < size)); // does NOT trigger
if (page_is_pfmemalloc(page))
skb->pfmemalloc = true;
}
@@ -2849,10 +2853,14 @@ static inline bool skb_can_coalesce(struct sk_buff *skb, int i,
const struct page *page, int off)
{
if (i) {
- const struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[i - 1];
+ struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[i - 1];
- return page == skb_frag_page(frag) &&
+ bool ret = page == skb_frag_page(frag) &&
off == frag->page_offset + skb_frag_size(frag);
+ if (unlikely(ret))
+ if (PageSlab(compound_head((struct page *)page)))
+ frag->coals++;
+ return ret;
}
return false;
}
On Fri, Jun 1, 2018 at 2:55 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
Well that would certainly make more sense (well, the sense aboutOn Fri, Jun 1, 2018 at 1:58 PM, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> On Fri, Jun 01, 2018 at 01:49:38PM -0700, Kees Cook wrote:
>> On Fri, Jun 1, 2018 at 12:02 PM, Laura Abbott <labbott@xxxxxxxxxx> wrote:
>> > (cc-ing some interested people)
>> >
>> >
>> >
>> > On 05/31/2018 05:03 PM, Anton Eidelman wrote:
>> >> Here's a rare issue I reproduce on 4.12.10 (centos config): full log
>> >> sample below.
>>
>> Thanks for digging into this! Do you have any specific reproducer for
>> this? If so, I'd love to try a bisection, as I'm surprised this has
>> only now surfaced: hardened usercopy was introduced in 4.8 ...
>>
>> >> An innocent process (dhcpclient) is about to receive a datagram, but
>> >> during skb_copy_datagram_iter() usercopy triggers a BUG in:
>> >> usercopy.c:check_heap_object() -> slub.c:__check_heap_object(), because
>> >> the sk_buff fragment being copied crosses the 64-byte slub object boundary.
>> >>
>> >> Example __check_heap_object() context:
>> >> n=128 << usually 128, sometimes 192.
>> >> object_size=64
>> >> s->size=64
>> >> page_address(page)=0xffff880233f7c000
>> >> ptr=0xffff880233f7c540
>> >>
>> >> My take on the root cause:
>> >> When adding data to an skb, new data is appended to the current
>> >> fragment if the new chunk immediately follows the last one: by simply
>> >> increasing the frag->size, skb_frag_size_add().
>> >> See include/linux/skbuff.h:skb_can_coalesce() callers.
>>
>> Oooh, sneaky:
>> return page == skb_frag_page(frag) &&
>> off == frag->page_offset + skb_frag_size(frag);
>>
>> Originally I was thinking that slab red-zoning would get triggered
>> too, but I see the above is checking to see if these are precisely
>> neighboring allocations, I think.
>>
>> But then ... how does freeing actually work? I'm really not sure how
>> this seeming layering violation could be safe in other areas?
>
> I'm confused ... I thought skb frags came from the page_frag allocator,
> not the slab allocator. But then why would the slab hardening trigger?
alloc/free). Having it overlap with a slab allocation, though, that's
quite bad. Perhaps this is a very odd use-after-free case? I.e. freed
page got allocated to slab, and when it got copied out, usercopy found
it spanned a slub object?
[ 655.602500] usercopy: kernel memory exposure attempt detected from
ffff88022a31aa00 (kmalloc-64) (192 bytes)
This wouldn't be the first time usercopy triggered due to a memory corruption...
-Kees
--
Kees Cook
Pixel Security