On Tue, Mar 19, 2019 at 10:32 PM <fei.yang@xxxxxxxxx> wrote: > > From: Fei Yang <fei.yang@xxxxxxxxx> > > The following kernel panic happens due to the io_data buffer gets deallocated > before the async io is completed. Add a check for the case where io_data buffer > should be deallocated by ffs_user_copy_worker. > > [ 41.663334] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 > [ 41.672099] #PF error: [normal kernel read fault] > [ 41.677356] PGD 20c974067 P4D 20c974067 PUD 20c973067 PMD 0 > [ 41.683687] Oops: 0000 [#1] PREEMPT SMP > [ 41.687976] CPU: 1 PID: 7 Comm: kworker/u8:0 Tainted: G U 5.0.0-quilt-2e5dc0ac-00790-gd8c79f2-dirty #2 > [ 41.705309] Workqueue: adb ffs_user_copy_worker > [ 41.705316] RIP: 0010:__vunmap+0x2a/0xc0 > [ 41.705318] Code: 0f 1f 44 00 00 48 85 ff 0f 84 87 00 00 00 55 f7 c7 ff 0f 00 00 48 89 e5 41 55 41 89 f5 41 54 53 48 89 fb 75 71 e8 56 d7 ff ff <4c> 8b 60 48 4d 85 e4 74 76 48 89 df e8 25 ff ff ff 45 85 ed 74 46 > [ 41.705320] RSP: 0018:ffffbc3a40053df0 EFLAGS: 00010286 > [ 41.705322] RAX: 0000000000000000 RBX: ffffbc3a406f1000 RCX: 0000000000000000 > [ 41.705323] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 00000000ffffffff > [ 41.705324] RBP: ffffbc3a40053e08 R08: 000000000001fb79 R09: 0000000000000037 > [ 41.705325] R10: ffffbc3a40053b68 R11: ffffbc3a40053cad R12: fffffffffffffff2 > [ 41.705326] R13: 0000000000000001 R14: 0000000000000000 R15: ffffffffffffffff > [ 41.705328] FS: 0000000000000000(0000) GS:ffff9e2977a80000(0000) knlGS:0000000000000000 > [ 41.705329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 41.705330] CR2: 0000000000000048 CR3: 000000020c994000 CR4: 00000000003406e0 > [ 41.705331] Call Trace: > [ 41.705338] vfree+0x50/0xb0 > [ 41.705341] ffs_user_copy_worker+0xe9/0x1c0 > [ 41.705344] process_one_work+0x19f/0x3e0 > [ 41.705348] worker_thread+0x3f/0x3b0 > [ 41.829766] kthread+0x12b/0x150 > [ 41.833371] ? process_one_work+0x3e0/0x3e0 > [ 41.838045] ? kthread_create_worker_on_cpu+0x70/0x70 > [ 41.843695] ret_from_fork+0x3a/0x50 > [ 41.847689] Modules linked in: hci_uart bluetooth ecdh_generic rfkill_gpio dwc3_pci dwc3 snd_usb_audio mei_me tpm_crb snd_usbmidi_lib xhci_pci xhci_hcd mei tpm snd_hwdep cfg80211 snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_hda_core videobuf2_dma_sg crlmodule > [ 41.876880] CR2: 0000000000000048 > [ 41.880584] ---[ end trace 2bc4addff0f2e673 ]--- > [ 41.891346] RIP: 0010:__vunmap+0x2a/0xc0 > [ 41.895734] Code: 0f 1f 44 00 00 48 85 ff 0f 84 87 00 00 00 55 f7 c7 ff 0f 00 00 48 89 e5 41 55 41 89 f5 41 54 53 48 89 fb 75 71 e8 56 d7 ff ff <4c> 8b 60 48 4d 85 e4 74 76 48 89 df e8 25 ff ff ff 45 85 ed 74 46 > [ 41.916740] RSP: 0018:ffffbc3a40053df0 EFLAGS: 00010286 > [ 41.922583] RAX: 0000000000000000 RBX: ffffbc3a406f1000 RCX: 0000000000000000 > [ 41.930563] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 00000000ffffffff > [ 41.938540] RBP: ffffbc3a40053e08 R08: 000000000001fb79 R09: 0000000000000037 > [ 41.946520] R10: ffffbc3a40053b68 R11: ffffbc3a40053cad R12: fffffffffffffff2 > [ 41.954502] R13: 0000000000000001 R14: 0000000000000000 R15: ffffffffffffffff > [ 41.962482] FS: 0000000000000000(0000) GS:ffff9e2977a80000(0000) knlGS:0000000000000000 > [ 41.971536] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 41.977960] CR2: 0000000000000048 CR3: 000000020c994000 CR4: 00000000003406e0 > [ 41.985930] Kernel panic - not syncing: Fatal exception > [ 41.991817] Kernel Offset: 0x16000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > [ 42.009525] Rebooting in 10 seconds.. > [ 52.014376] ACPI MEMORY or I/O RESET_REG. > > Fixes: 772a7a724f6 ("usb: gadget: f_fs: Allow scatter-gather buffers") > Signed-off-by: Fei Yang <fei.yang@xxxxxxxxx> > Reviewed-by: Manu Gautam <mgautam@xxxxxxxxxxxxxx> > Tested-by: John Stultz <john.stultz@xxxxxxxxxx> > --- > V2: add tag: "Fixes: 772a7a724f6 ......", Reviewed-by and Tested-by. Hey Fei, So while this patch does resolve the issues I was seeing with mainline kernels and recent changes to adbd, Josh pointed out that it wouldn't resolve the issues I was seeing with older kernels which is slightly different (but still related to aio usage). On the older kernels I'm hitting scheduling while atomic on reboot, which seems to be due to ffs_aio_cancel() taking a spinlock then calling usb_ep_dequeue() which might sleep. It seems a fix for this was tried earlier with d52e4d0c0c428 ("usb: gadget: ffs: Fix BUG when userland exits with submitted AIO transfers") which was then reverted by a9c859033f6e. Elsewhere it seems the ffs driver takes effort to drop any locks before calling usb_ep_dequeue(), so this seems like that should be addressed, but it also seems like recent change to the dwc3 driver has been made to avoid sleeping in that path (see fec9095bdef4 ("usb: dwc3: gadget: remove wait_end_transfer")), which may be why I'm not seeing the problem with mainline (and your patch here, of coarse). But that also doesn't clarify if its still a potential issue w/ non-dwc3 platforms. So for older kernels, do you have a suggestion of which approach is advised? Does usb_ep_dequeue need to avoid sleeping or do we need to rework the ffs_aio_cancel logic? thanks -john