Re: [PATCH] kfence: check kfence canary in panic and reboot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Thu, Apr 21, 2022 at 2:10 PM Shaobo Huang <huangshaobo6@xxxxxxxxxx> wrote:
> > From: huangshaobo <huangshaobo6@xxxxxxxxxx>
> >
> > when writing out of bounds to the red zone, it can only be detected at
> > kfree. However, there were many scenarios before kfree that caused this
> > out-of-bounds write to not be detected. Therefore, it is necessary to
> > provide a method for actively detecting out-of-bounds writing to the red
> > zone, so that users can actively detect, and can be detected in the
> > system reboot or panic.
> >
> >
> After having analyzed a couple of KFENCE memory corruption reports in the
> wild, I have doubts that this approach will be helpful.
>
> Note that KFENCE knows nothing about the memory access that performs the
> actual corruption.
>
> It's rather easy to investigate corruptions of short-living objects, e.g.
> those that are allocated and freed within the same function. In that case,
> one can examine the region of the code between these two events and try to
> understand what exactly caused the corruption.
>
> But for long-living objects checked at panic/reboot we'll effectively have
> only the allocation stack and will have to check all the places where the
> corrupted object was potentially used.
> Most of the time, such reports won't be actionable.

The detection mechanism of kfence is probabilistic. It is not easy to find a bug.
It is a pity to catch a bug without reporting it. and the cost of panic detection
is not large, so panic detection is still valuable.


I am also a big fan of showing as much information as possible to help the developers debug a memory corruption.
But I am still struggling to understand how the proposed patch helps.
Assume we have some generic allocation of an skbuff, so the reports looks like this:

=============================================
BUG: KFENCE: memory corruption in <frame that triggered reboot>
Corrupted memory at <end+1>
<stack trace of reboot event>

kfence-#59: <start>-<end>,size=100,cache=kmalloc-128  allocated by task 77 on cpu 0 at 28.018073s:
kmem_cache_alloc
__alloc_skb
alloc_skb_with_frags
sock_alloc_send_pskb
unix_stream_sendmsg
sock_sendmsg
__sys_sendto
__x64_sys_sendto
=============================================
 
This report will denote that in a system that could have been running for days a particular skbuff was corrupted by some unknown task at some unknown point in time.
How do we figure out what exactly caused this corruption?

When we deploy KFENCE at scale, it is rarely possible for the kernel developer to get access to the host that reported the bug and try to reproduce it.
With that in mind, the report (plus the kernel source) must contain all the necessary information to address the bug, otherwise reporting it will result in wasting the developer's time.
Moreover, if we report such bugs too often, our tool loses the credit, which is hard to regain.

> > for example, if the application memory is out of bounds and written to
> > the red zone in the kfence object, the system suddenly panics, and the
> > following log can be seen during system reset:
> > BUG: KFENCE: memory corruption in atomic_notifier_call_chain+0x49/0x70
[...]

thanks,
ShaoBo Huang


--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux