>-----Original Message----- >From: linux-input-owner@xxxxxxxxxxxxxxx [mailto:linux-input- >owner@xxxxxxxxxxxxxxx] On Behalf Of Onkalo Samu.P (Nokia-D/Tampere) >Sent: 17 February, 2010 10:16 >To: dmitry.torokhov@xxxxxxxxx; oleg@xxxxxxxxxx >Cc: linux-input@xxxxxxxxxxxxxxx >Subject: RE: [PATCH] input: polldev can cause crash in case of polling >disabled > > > >>-----Original Message----- >>From: ext Dmitry Torokhov [mailto:dmitry.torokhov@xxxxxxxxx] >>Sent: 16 February, 2010 23:43 >>To: Onkalo Samu.P (Nokia-D/Tampere); oleg@xxxxxxxxxx >>Cc: linux-input@xxxxxxxxxxxxxxx >>Subject: Re: [PATCH] input: polldev can cause crash in case of polling >>disabled >> >>On Tue, Feb 16, 2010 at 07:37:49PM +0100, samu.p.onkalo@xxxxxxxxx >wrote: >>> >>> >>> >-----Original Message----- >>> >From: ext Dmitry Torokhov [mailto:dmitry.torokhov@xxxxxxxxx] >>> >Sent: 16 February, 2010 19:51 >>> >To: Onkalo Samu.P (Nokia-D/Tampere) >>> >Cc: linux-input@xxxxxxxxxxxxxxx >>> >Subject: Re: [PATCH] input: polldev can cause crash in case of >>polling >>> >disabled >>> > >>> >Hi Samu, >>> > >>> >On Tue, Feb 16, 2010 at 04:44:41PM +0200, Samu Onkalo wrote: >>> >> If polling is set to disabled value and polled input device >>> >> is opened and closed several times, address to workqueue will >>probably >>> >> change at some point. Since nothing is queued (due to polled >>disabled >>> >> state), content of the work struct contains pointer to the old and >>> >non-existent >>> >> workqueue. >>> > >>> >This I do not quite understand. The work struct as far as I can see >>does >>> >not reference workqueue at all. There is a list entry but if we do >>not >>> >poll the device that entry should be always detached from any lists. >>We >>> >properly initialize WQ entry when we create the device and it shoudl >>> >remain valid until the device is destroyed. >>> >>> 'data' entry contains a pointer to per-cpu-workqueue which in turn >>contains >>> the workqueue pointer. This 'data' entry is not ok in case of >failure. >>I can >>> Collect more data about this. >>> >>> > >>> >> When the device is closed again, cancel_delayed_work_sync >>> >> goes crazy due to pointer to nonexisting workqueue. >>> >> >>> > >>> >What kind of failure do you see? Is there a stack trace or >something? >>> >>> Kernel panic while in workqueue handling (paging fault with some >>crappy address). > >It looks like this (in ARM): >[ 88.974884] Unable to handle kernel paging request at virtual address >732e726f >[ 88.994445] pgd = c0004000 >[ 88.997192] [732e726f] *pgd=00000000 >[ 89.013610] Internal error: Oops: 5 [#1] PREEMPT >[ 89.079406] CPU: 0 Not tainted (2.6.32-09063-g2953cf4 #30) >[ 89.085296] PC is at __cancel_work_timer+0xfc/0x1a0 >[ 89.090179] LR is at __cancel_work_timer+0xec/0x1a0 >[ 89.095092] pc : [<c006f880>] lr : [<c006f870>] psr: a0000153 >[ 89.095092] sp : cf411c98 ip : cf410000 fp : cf411d1c >[ 89.106628] r10: dfba59fc r9 : dfba59f0 r8 : dfba59e4 >[ 89.111877] r7 : 00000000 r6 : 00000000 r5 : dfba59e0 r4 : >dfba59c0 >[ 89.118438] r3 : 732e726f r2 : d34a1b00 r1 : 00000001 r0 : >dfba59f0 >[ 89.125000] Flags: NzCv IRQs on FIQs off Mode SVC_32 ISA ARM >Segment user >[ 89.132263] Control: 10c5387d Table: 91110019 DAC: 00000015 >[ 89.144866] Stack: (0xcf411c98 to 0xcf412000) >[ 89.149261] 1c80: >00000002 00000000 >[ 89.157470] 1ca0: c006f824 cf410000 00000000 d34a1b00 dfa3f540 >ded44078 cf411cfc cf411cc8 >[ 89.165710] 1cc0: c00861d8 c0084a20 00000002 00000000 00000000 >c02b2428 cf410000 20000153 >[ 89.173919] 1ce0: cf411cfc c00813a4 cf410000 20000153 cf411d44 >dfba59c0 dfb618cc dfb2b864 >[ 89.182159] 1d00: df67e960 ded44078 dfa3f540 ded44078 cf411d2c >cf411d20 c006f938 c006f790 >[ 89.190368] 1d20: cf411d44 cf411d30 bf10c2fc c006f930 dfb2b0c0 >dfb618cc cf411d64 cf411d48 >[ 89.198608] 1d40: c02b2458 bf10c2e8 dfb618c0 d1ca1080 dfb6193c >df67e960 cf411d84 cf411d68 >[ 89.206817] 1d60: c02b4ab8 c02b240c 00000000 cf408200 00000010 >df67e960 cf411dc4 cf411d88 >[ 89.215057] 1d80: c00ce2e0 c02b4a38 00000000 00000000 cf408200 >df67e960 cf411db4 cf408200 >[ 89.223266] 1da0: 00000000 d3bca000 0000000c d3bca008 00000001 >00000000 cf411dd4 cf411dc8 >[ 89.231506] 1dc0: c00ce438 c00ce1bc cf411df4 cf411dd8 c00caa74 >c00ce414 00000041 00000000 >[ 89.239715] 1de0: d3bca000 0000000c cf411e1c cf411df8 c005f8dc >c00caa08 d34a1de0 d34a1b00 >[ 89.247955] 1e00: d3bca000 00000002 cf411ee0 d349b684 cf411e3c >cf411e20 c005f984 c005f85c >[ 89.256164] 1e20: 00000002 d34a1b00 cf410000 00000002 cf411e54 >cf411e40 c0060fa8 c005f950 >[ 89.264404] 1e40: cf411e5c cf411e50 cf411e74 cf411e58 c00614ec >c0060de8 c002e204 00000009 >[ 89.272613] 1e60: be95167c cf410000 cf411ec4 cf411e78 c006c218 >c0061460 00000400 dfb618f0 >[ 89.280853] 1e80: dfb618f0 20000153 cf411ea4 cf411fb0 d349b184 >d34d1680 cf411ebc c002e204 >[ 89.289062] 1ea0: cf411fb0 be95167c 00000003 c002e204 cf410000 >00000003 cf411fac cf411ec8 >[ 89.297302] 1ec0: c0030430 c006be94 d1ca1080 cf410000 cf411f3c >00000001 c02b55a8 c0073430 >[ 89.305511] 1ee0: 00000009 00000000 00000000 00000000 00000000 >c00732ec cf411ef8 cf411ef8 >[ 89.313751] 1f00: c00ccbfc c01e16e0 00000000 dfae9a10 d353cdd4 >cf408200 be951644 cf411f70 >[ 89.321960] 1f20: 00000400 00000400 cf410000 00000003 cf411f6c >cf411f40 c00cd770 c02b54c4 >[ 89.330200] 1f40: 00000000 00000000 cf410000 00000000 00000000 >cf408200 00000400 be951644 >[ 89.338409] 1f60: cf411fa4 cf411f70 c00cd8e0 c00cd6c4 cf411f8c >cf411f80 c0082b24 c0082250 >[ 89.346649] 1f80: 00000000 00012d98 00012818 be95167c 00000003 >c002e204 cf410000 00000003 >[ 89.354858] 1fa0: 00000000 cf411fb0 c002e098 c00303c0 fffffe00 >be951644 00000400 00000001 >[ 89.363098] 1fc0: 00012d98 00012818 be95167c 00000003 be951644 >00000003 00000003 be951cd4 >[ 89.371307] 1fe0: 00000002 be950a70 0000cfe4 400fbfcc 60000050 >00000003 00000000 00000000 >[ 89.379547] Backtrace: >[ 89.382019] [<c006f784>] (__cancel_work_timer+0x0/0x1a0) from >[<c006f938>] (cancel_delayed_work_sync+0x14/0x18) >[ 89.392181] [<c006f924>] (cancel_delayed_work_sync+0x0/0x18) from >[<bf10c2fc>] (input_close_polled_device+0x20/0x74 [input_p) >[ 89.404174] [<bf10c2dc>] (input_close_polled_device+0x0/0x74 >[input_polldev]) from [<c02b2458>] (input_close_device+0x58/0x7) >[ 89.415618] r5:dfb618cc r4:dfb2b0c0 >[ 89.419250] [<c02b2400>] (input_close_device+0x0/0x7c) from >[<c02b4ab8>] (evdev_release+0x8c/0xa4) >[ 89.428253] r7:df67e960 r6:dfb6193c r5:d1ca1080 r4:dfb618c0 >[ 89.433990] [<c02b4a2c>] (evdev_release+0x0/0xa4) from [<c00ce2e0>] >(__fput+0x130/0x258) >[ 89.442108] r7:df67e960 r6:00000010 r5:cf408200 r4:00000000 >[ 89.447845] [<c00ce1b0>] (__fput+0x0/0x258) from [<c00ce438>] >(fput+0x30/0x34) >[ 89.455108] [<c00ce408>] (fput+0x0/0x34) from [<c00caa74>] >(filp_close+0x78/0x84) >[ 89.462646] [<c00ca9fc>] (filp_close+0x0/0x84) from [<c005f8dc>] >(put_files_struct+0x8c/0xf4) >[ 89.471221] r7:0000000c r6:d3bca000 r5:00000000 r4:00000041 >[ 89.476928] [<c005f850>] (put_files_struct+0x0/0xf4) from >[<c005f984>] (exit_files+0x40/0x44) >[ 89.485504] [<c005f944>] (exit_files+0x0/0x44) from [<c0060fa8>] >(do_exit+0x1cc/0x678) >[ 89.493469] r7:00000002 r6:cf410000 r5:d34a1b00 r4:00000002 >[ 89.499206] [<c0060ddc>] (do_exit+0x0/0x678) from [<c00614ec>] >(do_group_exit+0x98/0xc8) >[ 89.507354] [<c0061454>] (do_group_exit+0x0/0xc8) from [<c006c218>] >(get_signal_to_deliver+0x390/0x3d8) >[ 89.516784] r7:cf410000 r6:be95167c r5:00000009 r4:c002e204 >[ 89.522521] [<c006be88>] (get_signal_to_deliver+0x0/0x3d8) from >[<c0030430>] (do_notify_resume+0x7c/0x594) >[ 89.532226] [<c00303b4>] (do_notify_resume+0x0/0x594) from >[<c002e098>] (work_pending+0x1c/0x20) >[ 89.541076] Code: e5953000 e3d33003 0a000011 e593304c (e5934000) >[ 90.065185] ---[ end trace d28022cc252c43c3 ]--- >[ 90.075866] Kernel panic - not syncing: Fatal exception > >I added traces to drivers/input/input-polldev.c >static int input_open_polled_device(struct input_dev *input) >{ > struct input_polled_dev *dev = input_get_drvdata(input); > int error; > struct cpu_workqueue_struct *cwq; > > printk("INPUT_OPEN_POLLED_DEVICE - enter\n"); > printk("polldev_wq before start_workqueue: %x\n", polldev_wq); > if (polldev_wq != NULL) > printk("cpu_wq of wq: %x\n", polldev_wq->cpu_wq); > > > error = input_polldev_start_workqueue(); > if (error) > return error; > printk("polldev_wq after start_workqueue: %x\n", polldev_wq); > if (polldev_wq != NULL) > printk("cpu_wq of wq: %x\n", polldev_wq->cpu_wq); > > if (dev->open) > dev->open(dev); > > printk("poll_interval: %x\n", dev->poll_interval ); > printk("addr of work: %x\n", &dev->work); > cwq = atomic_long_read(&dev->work.work.data) & >WORK_STRUCT_WQ_DATA_MASK; > printk("data (==cwq) at work (before queueing): %x\n", cwq); > if (cwq != NULL) > printk("wq from cwq: %x\n", cwq->wq); > > /* Only start polling if polling is enabled */ > if (dev->poll_interval > 0) > queue_delayed_work(polldev_wq, &dev->work, 0); > > cwq = atomic_long_read(&dev->work.work.data) & >WORK_STRUCT_WQ_DATA_MASK; > printk("data (==cwq) at work (after queueing): %x\n", cwq); > if (cwq != NULL) > printk("wq from cwq: %x\n", cwq->wq); > > printk("INPUT_OPEN_POLLED_DEVICE - done\n"); > return 0; >} > >First time device open: >[ 29.094879] INPUT_OPEN_POLLED_DEVICE - enter >[ 29.099182] polldev_wq before start_workqueue: 0 >[ 29.154449] polldev_wq after start_workqueue: ded3c580 >[ 29.159637] cpu_wq of wq: dc4b8480 >[ 29.255950] poll_interval: 32 >[ 29.258941] addr of work: dfbe8f20 >[ 29.262359] data (==cwq) at work (before queueing): 0 >[ 29.267669] data (==cwq) at work (after queueing): dc4b8480 <------- >CPU workqueue same as polldev_wq >[ 29.273315] wq from cwq: ded3c580 >[ 29.276641] INPUT_OPEN_POLLED_DEVICE - done > >there were other devices opened and closed between > >Second time device open: >[ 202.372680] INPUT_OPEN_POLLED_DEVICE - enter >[ 202.377258] polldev_wq before start_workqueue: dc57ba00 >[ 202.382598] cpu_wq of wq: dc4ea900 >[ 202.386077] polldev_wq after start_workqueue: dc57ba00 >[ 202.391326] cpu_wq of wq: dc4ea900 >[ 202.459259] poll_interval: 0 <-------------------------------- no >queueing because of this >[ 202.462188] addr of work: dfbe8f20 >[ 202.465637] data (==cwq) at work (before queueing): dc4b8480 <------ >----- CPU workqueue not updated. >[ 202.471435] wq from cwq: 6b6b006f >[ 202.474853] data (==cwq) at work (after queueing): dc4b8480 <----- >queueing not done -> not updated >[ 202.480468] wq from cwq: 6b6b006f <-------------------------- crap >[ 202.483886] INPUT_OPEN_POLLED_DEVICE - done > >And when cancel_delayed_work_sync is called, kernel crashes due to crap >address to per-cpu workqueue. > >Actually problem is that "data" field in the work struct points to the >non-existing per-cpu workqueue entry. >When this is cancelled at device close, kernel crashes. But what is the >root cause? In input-polldev, >workqueue can change from time to time depending on what happens at the >userspace. Work struct >can still contain references to the old workqueue. Is that illegal thing >to do? >Or should cancellation of the work cleanup the work struct so that it is >in initial state. > >It is said in the workqueue.c: >* The caller must ensure that workqueue_struct on which this work was >last > * queued can't be destroyed before this function returns. > >This is not true here. Workqueue has been destroyed since the work has >never queued to the new workqueue. >Either cancel_(delayed)_work_sync should clear the data field instead of >setting it to non-pending or >input-polldev must clear the work struct in case of no queueing. Or do >you have other proposals? > One solution is not to destroy and recreate workqueue in input-polldev based on open / close calls. This way references to workqueue stays valid. -Samu -- To unsubscribe from this list: send the line "unsubscribe linux-input" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html