Re: hidp_get_raw_report source of 5 second delay in device removal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2013-02-20 at 16:44 +0100, David Herrmann wrote:
> >
> > The first patch is critical, the second makes everything neat and tidy.
> 
> I'm confused. I think the second one is critical, not the first one,
> right? At least according to the list above. Anyway, both bugs are
> known and need to be fixed. I reviewed both of your patches and the
> second one needs major rework as I think it's the wrong fix.
> 

Well its semantics and one's point of view. The 5 second delay is a big
hit ... and can have other side effects such as making hidp_session race
against the sock structure eventually being orphaned (which leads to a
kernel OOPS). Anyway - at least we are both on the same page now.

> Some minor comments below:
> 
> > Feb 20 09:31:15 laptop2 kernel: [  334.375756] hci_chan_del: hci0 hcon e000bc00 chan e00d7dc0
> > *** the l2cap_disonn_cfm/l2cap_chan_del above triggers the hidp_session
> > *** thread, so hidp_session can now fall out its main loop.
> > *** hidp_session then does hid_destroy_device(), which through code in the power_supply
> > *** part of the kernel invokes a battery status lookup which requires a call to hidp_get_raw_report
> 
> Yeah, same procedure applies to device-unplug. The calltrace is
> totally valid and understandable. Your "atomic_inc(&hidp->terminate)"
> patch fixes this problem properly.
> 

Yep - I'll resend with your comment there taken on board.

> > Feb 20 09:31:15 laptop2 kernel: [  334.375765] hidp_session Out of main loop
> > Feb 20 09:31:15 laptop2 kernel: [  334.375767] hidp_session About to do hid_destroy_device
> > Feb 20 09:31:15 laptop2 kernel: [  334.375857] hidp_get_raw_report: Entering hidp_get_raw_report
> > Feb 20 09:31:15 laptop2 kernel: [  334.375860] __hidp_send_ctrl_message: session f3294f00 data e0115d36 size 1
> > *** hidp_get_raw_report sends a ctrl message and then waits. Meanwhile the main kernel gets
> > *** on with running the hci driver code ...
> > Feb 20 09:31:15 laptop2 kernel: [  334.376725] hci_conn_put: hcon e000bc00 orig refcnt 0
> > Feb 20 09:31:15 laptop2 kernel: [  334.376729] hci_conn_del: hci0 hcon e000bc00 handle 46
> > Feb 20 09:31:15 laptop2 kernel: [  334.376732] hci_chan_list_flush: hcon e000bc00
> > Feb 20 09:31:15 laptop2 kernel: [  334.376832] hci_dev_put: hci0 orig refcnt 12
> > Feb 20 09:31:15 laptop2 kernel: [  334.376835] hci_sock_dev_event: hdev hci0 event 4
> > Feb 20 09:31:15 laptop2 kernel: [  334.376839] hci_send_to_sock: hdev   (null) len 8
> > Feb 20 09:31:15 laptop2 kernel: [  334.376850] hci_dev_put: hci0 orig refcnt 11
> > Feb 20 09:31:15 laptop2 kernel: [  334.376854] hci_send_to_control: len 6
> > Feb 20 09:31:15 laptop2 kernel: [  334.376856] hci_sock_dev_event: hdev hci0 event 2
> > Feb 20 09:31:15 laptop2 kernel: [  334.376859] hci_send_to_sock: hdev   (null) len 8
> > Feb 20 09:31:15 laptop2 kernel: [  334.376864] hci_dev_put: hci0 orig refcnt 10
> > Feb 20 09:31:15 laptop2 kernel: [  334.376910] hci_del_sysfs: f6119000 name hci0 bus 1
> > Feb 20 09:31:15 laptop2 kernel: [  334.376994] hci_dev_put: hci0 orig refcnt 5
> > *** the 'hci' device is the sysfs tree has now just been removed - yet the various devices
> > *** dependent on it have yet to be removed (input, event, hidraw, hci_conn)
> 
> 
> This is a bug, indeed. We need to notify all child devices first so
> they can get deleted. However, delaying the HCI removal is _wrong_!
> The HCI device must be deleted here.
> But we must simply forward the delete-request to the childs before
> doing the deletion.
> 

Why 'HCI device must be deleted here'? I guess there is a reason, but it
is not obvious to me.

My suggested fix simply follows the same logic as hci_conn_put_device,
which only does sysfs removal once all users of the HCI_CONN device have
finished.


> > *** Note the 5 SECOND gap ... thats the timeout value in hidp_get_raw_report!
> > *** having timed out, the hidp_session thread gets underway again
> > *** the various devices are now removed (input, event, hidraw, hci_conn)
> > *** and life can go on
> > Feb 20 09:31:15 laptop2 kernel: [  339.372148] hidp_get_raw_report timed out!
> > Feb 20 09:31:15 laptop2 kernel: [  339.372153] power_supply hid-60:c5:47:1b:07:b8-battery: driver failed to report `capacity' property: -5
> > Feb 20 09:31:15 laptop2 kernel: [  339.376347] PM: resume of devices complete after 5491.694 msecs
> > Feb 20 09:31:15 laptop2 kernel: [  339.376362] hidp_session Done hid_destry_device
> > Feb 20 09:31:15 laptop2 kernel: [  339.376366] hci_conn_del_sysfs: conn e000bc00
> > Feb 20 09:31:15 laptop2 kernel: [  339.376392] hci_dev_put: hci0 orig refcnt 1
> > *** a new hci device is registered, and a new hci_conn, and sub-tree of input devices
> > *** will be added, which is normal and to be expected.
> > *** NOTE though whilst the new registration occurs here in this log, I have seen it
> > *** occur much earlier in other logs (different versions of kernel), leading to
> > *** various problems when a new hci_conn is added whilst the old one has yet to be deleted.
> 
> No! There is no problem with adding a new hci_conn while another one
> is still to be removed. This totally normal and _must_ be supported.
> As long as both hci_conn objects have different IDs this is correctly
> understood by user-space, too.
> 
> We fixed the ID issue two years ago and this is no longer a problem
> here. If you see a bug where a hci_conn with an equal ID is
> registered, please report it. This would be a bug then.
> 

Fair enough - I have seen a new hci_conn with an equal ID, but I have
been playing with different kernels between 3.2 and 3.8. So unless I see
it again with 3.8 or later I'll trust that particular issue is indeed
resolved.

> > The bluetooth kernel code in its current form is problematic - hence all
> > kind of bug reports by users against their distributions.
> 
> I am currently reworking the HIDP layer. The whole module is _very_
> buggy. I will base my rework on your patches so feel free to resend
> them. If you don't resend them, I will fix it myself, but this will
> mean they get not applied right away as I will probably include them
> in my rework which might take longer.
> 
> Also, please CC the bluetooth kernel maintainers when sending patches!
> This includes Marcel Holtmann, Johan Hedberg and Gustavo Padovan, I
> think.
> 
Thankyou David. I'll do the minor alter to the first patch. I'll take up
discussion on the 2nd in that email thread later.

Karl


--
To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux