Re: vfio warning do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b33c00e2>] prepare_to_wait_event+0x14a/0x188

Alex Williamson <alex.williamson@xxxxxxxxxx> · Wed, 27 Mar 2019 16:24:30 -0600

On Wed, 27 Mar 2019 09:38:47 -0400
Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:

> Hi Alex,
> 
> I am seeing a warning in the vfio driver with the 5.0 linux kernel on an 
> s390 system:
> 
> [ 4050.264464] ------------[ cut here ]------------
> [ 4050.264508] do not call blocking ops when !TASK_RUNNING; state=1 set 
> at [<00000000b33c00e2>] prepare_to_wait_event+0x14a/0x188
> [ 4050.264529] WARNING: CPU: 12 PID: 35924 at kernel/sched/core.c:6112 
> __might_sleep+0x76/0x90
> ....
> 
> [ 4050.264636] CPU: 12 PID: 35924 Comm: sh Tainted: G        W 
> 5.0.0-00016-g4ef69cfc84b1-dirty #12
> [ 4050.264646] Hardware name: IBM 2964 NE1 749 (LPAR)
> [ 4050.264654] Krnl PSW : 0704d00180000000 000000000017bbae 
> (__might_sleep+0x76/0x90)
> [ 4050.264668]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 
> PM:0 RI:0 EA:3
> [ 4050.264678] Krnl GPRS: 00000000361d853f 0000000080000000 
> 0000000000000072 00000000001d05e2
> [ 4050.264691]            0000000000000007 0000000000bb3c20 
> 000003ff800d7abe 000003e013823b58
> [ 4050.264700]            0000000000000002 000003ff800d7abe 
> 0000000000000000 000000000000038c
> [ 4050.264708]            0000000000dd96e4 0000000800000000 
> 000000000017bbaa 000003e013823910
> [ 4050.264724] Krnl Code: 000000000017bb9e: c0200062dc5d        larl 
> %r2,dd7458
>                            000000000017bba4: c0e5fffe5012        brasl 
> %r14,145bc8
>                           #000000000017bbaa: a7f40001            brc 
> 15,17bbac
>                           >000000000017bbae: b904004a            lgr   
> %r4,%r10
>                            000000000017bbb2: b904003b            lgr 
> %r3,%r11
>                            000000000017bbb6: b904002c            lgr 
> %r2,%r12
>                            000000000017bbba: ebaff0a00004        lmg 
> %r10,%r15,160(%r15)
>                            000000000017bbc0: c0f4fffffe8c        brcl 
> 15,17b8d8
> [ 4050.264756] Call Trace:
> [ 4050.264765] ([<000000000017bbaa>] __might_sleep+0x72/0x90)
> [ 4050.264774]  [<0000000000b97edc>] __mutex_lock+0x44/0x8c0
> [ 4050.264782]  [<0000000000b9878a>] mutex_lock_nested+0x32/0x40
> [ 4050.264793]  [<000003ff800d7abe>] vfio_group_get_device+0x36/0xa8 [vfio]
> [ 4050.264803]  [<000003ff800d87c0>] vfio_del_group_dev+0x238/0x378 [vfio]
> [ 4050.264813]  [<000003ff8015f67c>] mdev_remove+0x3c/0x68 [mdev]
> [ 4050.264825]  [<00000000008e01b0>] 
> device_release_driver_internal+0x168/0x268
> [ 4050.264834]  [<00000000008de692>] bus_remove_device+0x162/0x190
> [ 4050.264843]  [<00000000008daf42>] device_del+0x1e2/0x368
> [ 4050.264851]  [<00000000008db12c>] device_unregister+0x64/0x88
> [ 4050.264862]  [<000003ff8015ed84>] mdev_device_remove+0xec/0x130 [mdev]
> [ 4050.264872]  [<000003ff8015f074>] remove_store+0x6c/0xa8 [mdev]
> [ 4050.264881]  [<000000000046f494>] kernfs_fop_write+0x14c/0x1f8
> [ 4050.264890]  [<00000000003c1530>] __vfs_write+0x38/0x1a8
> [ 4050.264899]  [<00000000003c187c>] vfs_write+0xb4/0x198
> [ 4050.264908]  [<00000000003c1af2>] ksys_write+0x5a/0xb0
> [ 4050.264916]  [<0000000000b9e270>] system_call+0xdc/0x2d8
> [ 4050.264925] 4 locks held by sh/35924:
> [ 4050.264933]  #0: 000000001ef90325 (sb_writers#4){.+.+}, at: 
> vfs_write+0x9e/0x198
> [ 4050.264948]  #1: 000000005c1ab0b3 (&of->mutex){+.+.}, at: 
> kernfs_fop_write+0x1cc/0x1f8
> [ 4050.264963]  #2: 0000000034831ab8 (kn->count#297){++++}, at: 
> kernfs_remove_self+0x12e/0x150
> [ 4050.264979]  #3: 00000000e152484f (&dev->mutex){....}, at: 
> device_release_driver_internal+0x5c/0x268
> [ 4050.264993] Last Breaking-Event-Address:
> [ 4050.265002]  [<000000000017bbaa>] __might_sleep+0x72/0x90
> [ 4050.265010] irq event stamp: 7039
> [ 4050.265020] hardirqs last  enabled at (7047): [<00000000001cee7a>] 
> console_unlock+0x6d2/0x740
> [ 4050.265029] hardirqs last disabled at (7054): [<00000000001ce87e>] 
> console_unlock+0xd6/0x740
> [ 4050.265040] softirqs last  enabled at (6416): [<0000000000b8fe26>] 
> __udelay+0xb6/0x100
> [ 4050.265049] softirqs last disabled at (6415): [<0000000000b8fe06>] 
> __udelay+0x96/0x100
> [ 4050.265057] ---[ end trace d04a07d39d99a9f9 ]---
> 
> 
> The problem seems to be the function vfio_dev_present which is the 
> condition to wait_event_interruptible_timeout.
> 
> wait_event_interruptible() sets the state of the current task to
> TASK_INTERRUPTIBLE, then checks the condition. And in vfio_dev_present 
> we call vfio_group_get_device which then tries to acquire the mutex 
> group->device_lock.
> 
> 
> I am not too familiar with vfio core driver code yet, and I am not sure 
> if the solution to this problem will be something like described in 
> https://lwn.net/Articles/628628/. If that is the solution we would want 
> I could take a crack at writing a patch.

Hmm, strange I've never seen this.  In any case, yes, thanks for
reporting it and it does seem like the issue discussed there.  If
you're interested in tackling it, that would be much appreciated.  Also
note the recent thread by Parav working on fixes to the mdev-core
removal and creation of devices.  If this issue is somehow unique to
mdev devices (maybe why I haven't seen it in vfio-pci?) then that
series might bring mdev more in line with vfio-pci semantics.  Thanks,

Alex