On Sat, Nov 12, 2022 at 10:21:49AM +0800, Leizhen (ThunderTown) wrote: > > > On 2022/11/12 2:35, Paul E. McKenney wrote: > > On Fri, Nov 11, 2022 at 06:04:26PM +0800, Leizhen (ThunderTown) wrote: > >> > >> > >> On 2022/11/11 15:45, Paul E. McKenney wrote: > >>> On Sat, Nov 05, 2022 at 10:35:19AM +0800, Zhen Lei wrote: > >>>> The callback rhp->func becomes NULL is usually caused by use-after-free. > >>>> So the information about 'rhp' is very useful. Unfortunately, nothing is > >>>> printed at the moment. Look at the panic output below, if no vmcore is > >>>> generated, there is almost no way to analyze it except to know that the > >>>> bug exists. > >>>> > >>>> Unable to handle kernel NULL pointer dereference at virtual address 0 > >>>> ... ... > >>>> PC is at 0x0 > >>>> LR is at rcu_do_batch+0x1c0/0x3b8 > >>>> ... ... > >>>> (rcu_do_batch) from (rcu_core+0x1d4/0x284) > >>>> (rcu_core) from (__do_softirq+0x24c/0x344) > >>>> (__do_softirq) from (__irq_exit_rcu+0x64/0x108) > >>>> (__irq_exit_rcu) from (irq_exit+0x8/0x10) > >>>> (irq_exit) from (__handle_domain_irq+0x74/0x9c) > >>>> (__handle_domain_irq) from (gic_handle_irq+0x8c/0x98) > >>>> (gic_handle_irq) from (__irq_svc+0x5c/0x94) > >>>> (__irq_svc) from (arch_cpu_idle+0x20/0x3c) > >>>> (arch_cpu_idle) from (default_idle_call+0x4c/0x78) > >>>> (default_idle_call) from (do_idle+0xf8/0x150) > >>>> (do_idle) from (cpu_startup_entry+0x18/0x20) > >>>> (cpu_startup_entry) from (0xc01530) > >>>> > >>>> So add mem_dump_obj(rhp) to output some information, for example: > >>>> slab kmalloc-256 start ffff410c45019900 pointer offset 0 size 256 > >>>> > >>>> Now we know the size of the memory block and the offset of rcu_head. Then > >>>> we can check the code. It's going to be slow and tiring, but it's better > >>>> than no way to start. > >>>> > >>>> Signed-off-by: Zhen Lei <thunder.leizhen@xxxxxxxxxx> > >>> > >>> I have pulled this in with the usual wordsmithing (please check!) > >>> for review and testing, thank you! > >> > >> Great! Thanks. Provides a lot of valuable debugging method information. > >> > >> In the following two lines, there are a few extra spaces after the dot. > >> I will delete it in v2. > >> > >> rhp->func to be set to NULL. This defeats the debugging prints used by > >> locate the problem. If the problem is reproducible, additional slab > > > > Please do adjust my wordsmithing as required. > > > >>> Questions include "Is 0x3 correct for functions compiled with all > >>> supported compiler options on all architectures on which the Linux > >> > >> Sorry, I found it possible that it wouldn't work on x86. Although I had > >> no problems booting up on x86 before. I ran a script today and found that > >> there were addresses that were not 4-byte aligned. > >> > >> I'll send v2 on your basis. > > > > x86 can be like that sometimes... > > > > I revert your current patch, and look forward to seeing your v2. > > > >> cat System.map | grep -E ' t | T ' | awk '{print substr($1,length($1),length($1))}' | sort | uniq -c > >> 52521 0 > >> 409 1 > >> 394 2 > >> 417 3 > >> 404 4 > >> 458 5 > >> 405 6 > >> 393 7 > >> 1205 8 > >> 457 9 > >> 442 a > >> 435 b > >> 421 c > >> 418 d > >> 421 e > >> 426 f > > > > Indeed, quite a few! Maybe the address check can be arch-specific, > > maybe using IS_ENABLED()? > > There are more precise, more effective, but relatively time-consuming > functions. Such as is_vmalloc_addr(), virt_addr_valid(). We cannot afford to slow down callback invocation very much. Even the two checks you are currently adding are a bit concerning. The problem happens when CPUs flood RCU with callbacks. Callback invocation needs to keep up with the flood of call_rcu() invocations. > >>> kernel runs?", "Is this added information useful often enough for > >>> this to be pushed to mainline?", and so on. > > > > And another question is "Should this be default?" There may be concerns > > with callback-invocation throughput during callback-flooding events. > > The overhead of the if statement is small compared with that of > memory freeing. However, there is a low probability that the callback > fails, it's also wise to set it to non-default. Very good, works for me! > >> I originally wanted to add a member in struct rcu_head and backup > >> 'func' to the previous node. This way, when the error is detected, > >> the hook function can be printed out. This will help us quickly > >> find the user of the invalid rhp. However, the size of the struct > >> page is limited and cannot be expanded. > > > > Although that information could be clobbered just as easily as could > > the ->func value, right? > > There is a low probability that the previous node and the current node > are in the same memory area. Therefore, it is relatively reliable to > back up ->func to the previous node. However, this method can be used > temporarily to locate the problem. After all, the impact is too great. Agreed. I could imagine at least some people running something like your current patches in production. I have a much harder time imagining very many people running an expanded rcu_head structure in production. Thanx, Paul > >> Further more, we can dump the contents of mem object. > >> > >> I have a problem that has not been resolved and has not reproduced. > >> The surrounding contents of 'rhp' have been dumped, as below. > >> You can highlight 00000024 and 00000030, you'll see that this is a > >> fixed 80-bytes structure. There is also a bidirectional linked list > >> in the structure. If I have mem_dump_obj(rhp) information, I can > >> narrow it down considerably. > >> > >> [20220928044206]5390: 00000024 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > >> [20220928044206]53b0: 00000000 00000000 00000000 00000000 cfa4d580 ffff4596 00000000 00000000 > >> [20220928044206]53d0: 7438f148 ffff4596 7438f148 ffff4596 00000024 00000000 0b828cfa 0f00aaf4 > >> [20220928044206]53f0: 00000000 00000000 00000000 00000000 496653c0 ffff4596 00000000 00000000 > >> [20220928044206]5410: 00000000 00000000 00000000 00000000 ae0769e0 ffff4596 ae0769e0 ffff4596 > >> [20220928044206]5430: 00000030 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > >> [20220928044206]5450: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > >> [20220928044206]5470: ae076988 ffff4596 ae076988 ffff4596 00000024 00000000 00000000 00000000 > > > > OK, I consider the "is this useful" question to be answered in the > > affirmative. > > > > Thanx, Paul > > > >>>> --- > >>>> kernel/rcu/rcu.h | 7 +++++++ > >>>> kernel/rcu/srcutiny.c | 1 + > >>>> kernel/rcu/srcutree.c | 1 + > >>>> kernel/rcu/tasks.h | 1 + > >>>> kernel/rcu/tiny.c | 1 + > >>>> kernel/rcu/tree.c | 1 + > >>>> 6 files changed, 12 insertions(+) > >>>> > >>>> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > >>>> index 70c79adfdc7046c..4844dec36bddb48 100644 > >>>> --- a/kernel/rcu/rcu.h > >>>> +++ b/kernel/rcu/rcu.h > >>>> @@ -10,6 +10,7 @@ > >>>> #ifndef __LINUX_RCU_H > >>>> #define __LINUX_RCU_H > >>>> > >>>> +#include <linux/mm.h> > >>>> #include <trace/events/rcu.h> > >>>> > >>>> /* > >>>> @@ -211,6 +212,12 @@ static inline void debug_rcu_head_unqueue(struct rcu_head *head) > >>>> } > >>>> #endif /* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */ > >>>> > >>>> +static inline void debug_rcu_head_callback(struct rcu_head *rhp) > >>>> +{ > >>>> + if (unlikely(!rhp->func || (unsigned long)rhp->func & 0x3)) > >>>> + mem_dump_obj(rhp); > >>>> +} > >>>> + > >>>> extern int rcu_cpu_stall_suppress_at_boot; > >>>> > >>>> static inline bool rcu_stall_is_suppressed_at_boot(void) > >>>> diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c > >>>> index 33adafdad261389..5e7f336baa06ae0 100644 > >>>> --- a/kernel/rcu/srcutiny.c > >>>> +++ b/kernel/rcu/srcutiny.c > >>>> @@ -138,6 +138,7 @@ void srcu_drive_gp(struct work_struct *wp) > >>>> while (lh) { > >>>> rhp = lh; > >>>> lh = lh->next; > >>>> + debug_rcu_head_callback(rhp); > >>>> local_bh_disable(); > >>>> rhp->func(rhp); > >>>> local_bh_enable(); > >>>> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > >>>> index ca4b5dcec675bac..294972e66b31863 100644 > >>>> --- a/kernel/rcu/srcutree.c > >>>> +++ b/kernel/rcu/srcutree.c > >>>> @@ -1631,6 +1631,7 @@ static void srcu_invoke_callbacks(struct work_struct *work) > >>>> rhp = rcu_cblist_dequeue(&ready_cbs); > >>>> for (; rhp != NULL; rhp = rcu_cblist_dequeue(&ready_cbs)) { > >>>> debug_rcu_head_unqueue(rhp); > >>>> + debug_rcu_head_callback(rhp); > >>>> local_bh_disable(); > >>>> rhp->func(rhp); > >>>> local_bh_enable(); > >>>> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > >>>> index b0b885e071fa8dc..b7f8c67c586cdc4 100644 > >>>> --- a/kernel/rcu/tasks.h > >>>> +++ b/kernel/rcu/tasks.h > >>>> @@ -478,6 +478,7 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu > >>>> raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags); > >>>> len = rcl.len; > >>>> for (rhp = rcu_cblist_dequeue(&rcl); rhp; rhp = rcu_cblist_dequeue(&rcl)) { > >>>> + debug_rcu_head_callback(rhp); > >>>> local_bh_disable(); > >>>> rhp->func(rhp); > >>>> local_bh_enable(); > >>>> diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c > >>>> index bb8f7d270f01747..56e9a5d91d97ec5 100644 > >>>> --- a/kernel/rcu/tiny.c > >>>> +++ b/kernel/rcu/tiny.c > >>>> @@ -97,6 +97,7 @@ static inline bool rcu_reclaim_tiny(struct rcu_head *head) > >>>> > >>>> trace_rcu_invoke_callback("", head); > >>>> f = head->func; > >>>> + debug_rcu_head_callback(head); > >>>> WRITE_ONCE(head->func, (rcu_callback_t)0L); > >>>> f(head); > >>>> rcu_lock_release(&rcu_callback_map); > >>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > >>>> index 93c286b98c8f03d..3b93b9df8042a84 100644 > >>>> --- a/kernel/rcu/tree.c > >>>> +++ b/kernel/rcu/tree.c > >>>> @@ -2256,6 +2256,7 @@ static void rcu_do_batch(struct rcu_data *rdp) > >>>> trace_rcu_invoke_callback(rcu_state.name, rhp); > >>>> > >>>> f = rhp->func; > >>>> + debug_rcu_head_callback(rhp); > >>>> WRITE_ONCE(rhp->func, (rcu_callback_t)0L); > >>>> f(rhp); > >>>> > >>>> -- > >>>> 2.25.1 > >>>> > >>> . > >>> > >> > >> -- > >> Regards, > >> Zhen Lei > > . > > > > -- > Regards, > Zhen Lei