Re: [PATCH v4 7/8] lockdep: Change hardirq{s_enabled,_context} to per-cpu variables

Marco Elver <elver@xxxxxxxxxx> · Wed, 24 Jun 2020 12:17:56 +0200

On Wed, 24 Jun 2020 at 11:01, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Tue, Jun 23, 2020 at 10:24:04PM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 23, 2020 at 08:12:32PM +0200, Peter Zijlstra wrote:
> > > Fair enough; I'll rip it all up and boot a KCSAN kernel, see what if
> > > anything happens.
> >
> > OK, so the below patch doesn't seem to have any nasty recursion issues
> > here. The only 'problem' is that lockdep now sees report_lock can cause
> > deadlocks.
> >
> > It is completely right about it too, but I don't suspect there's much we
> > can do about it, it's pretty much the standard printk() with scheduler
> > locks held report.
>
> So I've been getting tons and tons of this:
>
> [   60.471348] ==================================================================
> [   60.479427] BUG: KCSAN: data-race in __rcu_read_lock / __rcu_read_unlock
> [   60.486909]
> [   60.488572] write (marked) to 0xffff88840fff1cf0 of 4 bytes by interrupt on cpu 1:
> [   60.497026]  __rcu_read_lock+0x37/0x60
> [   60.501214]  cpuacct_account_field+0x1b/0x170
> [   60.506081]  task_group_account_field+0x32/0x160
> [   60.511238]  account_system_time+0xe6/0x110
> [   60.515912]  update_process_times+0x1d/0xd0
> [   60.520585]  tick_sched_timer+0xfc/0x180
> [   60.524967]  __hrtimer_run_queues+0x271/0x440
> [   60.529832]  hrtimer_interrupt+0x222/0x670
> [   60.534409]  __sysvec_apic_timer_interrupt+0xb3/0x1a0
> [   60.540052]  asm_call_on_stack+0x12/0x20
> [   60.544434]  sysvec_apic_timer_interrupt+0xba/0x130
> [   60.549882]  asm_sysvec_apic_timer_interrupt+0x12/0x20
> [   60.555621]  delay_tsc+0x7d/0xe0
> [   60.559226]  kcsan_setup_watchpoint+0x292/0x4e0
> [   60.564284]  __rcu_read_unlock+0x73/0x2c0
> [   60.568763]  __unlock_page_memcg+0xda/0xf0
> [   60.573338]  unlock_page_memcg+0x32/0x40
> [   60.577721]  page_remove_rmap+0x5c/0x200
> [   60.582104]  unmap_page_range+0x83c/0xc10
> [   60.586582]  unmap_single_vma+0xb0/0x150
> [   60.590963]  unmap_vmas+0x81/0xe0
> [   60.594663]  exit_mmap+0x135/0x2b0
> [   60.598464]  __mmput+0x21/0x150
> [   60.601970]  mmput+0x2a/0x30
> [   60.605176]  exit_mm+0x2fc/0x350
> [   60.608780]  do_exit+0x372/0xff0
> [   60.612385]  do_group_exit+0x139/0x140
> [   60.616571]  __do_sys_exit_group+0xb/0x10
> [   60.621048]  __se_sys_exit_group+0xa/0x10
> [   60.625524]  __x64_sys_exit_group+0x1b/0x20
> [   60.630189]  do_syscall_64+0x6c/0xe0
> [   60.634182]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   60.639820]
> [   60.641485] read to 0xffff88840fff1cf0 of 4 bytes by task 2430 on cpu 1:
> [   60.648969]  __rcu_read_unlock+0x73/0x2c0
> [   60.653446]  __unlock_page_memcg+0xda/0xf0
> [   60.658019]  unlock_page_memcg+0x32/0x40
> [   60.662400]  page_remove_rmap+0x5c/0x200
> [   60.666782]  unmap_page_range+0x83c/0xc10
> [   60.671259]  unmap_single_vma+0xb0/0x150
> [   60.675641]  unmap_vmas+0x81/0xe0
> [   60.679341]  exit_mmap+0x135/0x2b0
> [   60.683141]  __mmput+0x21/0x150
> [   60.686647]  mmput+0x2a/0x30
> [   60.689853]  exit_mm+0x2fc/0x350
> [   60.693458]  do_exit+0x372/0xff0
> [   60.697062]  do_group_exit+0x139/0x140
> [   60.701248]  __do_sys_exit_group+0xb/0x10
> [   60.705724]  __se_sys_exit_group+0xa/0x10
> [   60.710201]  __x64_sys_exit_group+0x1b/0x20
> [   60.714872]  do_syscall_64+0x6c/0xe0
> [   60.718864]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   60.724503]
> [   60.726156] Reported by Kernel Concurrency Sanitizer on:
> [   60.732089] CPU: 1 PID: 2430 Comm: sshd Not tainted 5.8.0-rc2-00186-gb4ee11fe08b3-dirty #303
> [   60.741510] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
> [   60.752957] ==================================================================
>
> And I figured a quick way to get rid of that would be something like the
> below, seeing how volatile gets auto annotated... but that doesn't seem
> to actually work.
>
> What am I missing?

There's one more in include/linux/rcupdate.h. I suggested this at some point:

    https://lore.kernel.org/lkml/20200220213317.GA35033@xxxxxxxxxx/

To avoid volatiles as I don't think they are needed here.

[ Still testing your other patches for KCSAN, will send another reply there. ]

Thanks,
-- Marco