Re: [PATCH] drm/i915: Protect debugfs per_file_stats with RCU lock

Guenter Roeck <linux@xxxxxxxxxxxx> · Tue, 30 Jun 2020 08:01:05 -0700

On Tue, Jun 30, 2020 at 10:14:25AM +0100, Chris Wilson wrote:
[ ... ]
> > > @@ -328,9 +334,9 @@ static void print_context_stats(struct seq_file *m,
> > >                       struct task_struct *task;
> > >                       char name[80];
> > >  
> > > -                     spin_lock(&file->table_lock);
> > > +                     rcu_read_lock();
> > >                       idr_for_each(&file->object_idr, per_file_stats, &stats);
> > > -                     spin_unlock(&file->table_lock);
> > > +                     rcu_read_unlock();
> > >  
> > For my education - is it indeed possible and valid to replace spin_lock()
> > with rcu_read_lock() to prevent list manipulation for a list used by
> > idr_for_each(), even if that list is otherwise manipulated under the
> > spinlock ?
> 
> It's a pure read of a radixtree here, and is supposed to be RCU safe:
> 
>  * idr_for_each() can be called concurrently with idr_alloc() and
>  * idr_remove() if protected by RCU.  Newly added entries may not be
>  * seen and deleted entries may be seen, but adding and removing entries
>  * will not cause other entries to be skipped, nor spurious ones to be seen.
> 
> That is the tree structure is stable.
> 
Ah, that makes sense. Thanks for the clarification.

> > Background: we are seeing a crash with the following call trace.
> > 
> > [ 1016.651593] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > ...
> > [ 1016.651693] Call Trace:
> > [ 1016.651703]  idr_for_each+0x8a/0xe8
> > [ 1016.651711]  i915_gem_object_info+0x2a3/0x3eb
> > [ 1016.651720]  seq_read+0x162/0x3ca
> > [ 1016.651727]  full_proxy_read+0x5b/0x8d
> > [ 1016.651733]  __vfs_read+0x45/0x1bb
> > [ 1016.651741]  vfs_read+0xc9/0x15e
> > [ 1016.651746]  ksys_read+0x7e/0xde
> > [ 1016.651752]  do_syscall_64+0x54/0x68
> > [ 1016.651758]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
Actually, the crash is not in idr_for_each, but in per_file_stats:

[ 1016.651637] RIP: 0010:per_file_stats+0xe/0x16e
[ 1016.651646] Code: d2 41 0f b6 8e 69 8c 00 00 48 89 df 48 c7 c6 7b 74 8c be 31 c0 e8 0c 89 cf ff eb d2 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 53 <8b> 06 85 c0 0f 84 4d 01 00 00 49 89 d6 48 89 f3 3d ff ff ff 7f 73
[ 1016.651651] RSP: 0018:ffffad3a01337ba0 EFLAGS: 00010293
[ 1016.651656] RAX: 0000000000000018 RBX: ffff96fe040d65e0 RCX: 0000000000000002
[ 1016.651660] RDX: ffffad3a01337c50 RSI: 0000000000000000 RDI: 00000000000001e8
[ 1016.651663] RBP: ffffad3a01337bb8 R08: 0000000000000000 R09: 00000000000001c0
[ 1016.651667] R10: 0000000000000000 R11: ffffffffbdbe5fce R12: 0000000000000000
[ 1016.651671] R13: ffffffffbdbe5fce R14: ffffad3a01337c50 R15: 0000000000000001
[ 1016.651676] FS:  00007a597e2d7480(0000) GS:ffff96ff3bb00000(0000) knlGS:0000000000000000
[ 1016.651680] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1016.651683] CR2: 0000000000000000 CR3: 0000000171fc2001 CR4: 00000000003606e0
[ 1016.651687] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1016.651690] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1016.651693] Call Trace:

Sorry for the confusion. From the context it appears that the second parameter
of per_file_stats() may be NULL, though I am not entirely sure if I got that
correctly.

> Is there a reason you are using this slow debugfs in the first place?

AFAICS ChromeOS is using the information to calculate graphics memory use.

Guenter
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx