Hi Ilia, * Ilia, Kolominsky <iliak@xxxxxx> [2011-12-27 11:58:54 +0000]: > Hi Marcel > > > Hi Ilia, > > > > > > > I have encountered an incorrect behavior of l2cap connection > > > > > establishment mechanism when handling an incoming connection > > > > > request: > > > > > > > > > > > ACL data: handle 1 flags 0x02 dlen 12 > > > > > L2CAP(s): Connect req: psm 23 scid 0x0083 > > > > > < ACL data: handle 1 flags 0x00 dlen 16 > > > > > L2CAP(s): Connect rsp: dcid 0x0040 scid 0x0083 result 0 > > status 0 > > > > > Connection successful > > > > > < HCI Command: Exit Sniff Mode (0x02|0x0004) plen 2 > > > > > handle 1 > > > > > < ACL data: handle 1 flags 0x00 dlen 12 > > > > > L2CAP(s): Config req: dcid 0x0083 flags 0x00 clen 0 > > > > > > HCI Event: Mode Change (0x14) plen 6 > > > > > status 0x00 handle 1 mode 0x00 interval 0 > > > > > Mode: Active > > > > > < ACL data: handle 1 flags 0x00 dlen 16 > > > > > L2CAP(s): Connect rsp: dcid 0x0040 scid 0x0083 result 1 > > status 2 > > > > > Connection pending - Authorization pending > > > > > > > > > > After analyzing the code, it seems to me that there is indeed a > > > > > clear possibility that replies will egress out of order on > > > > > multicore systems: > > > > > > > > > > CPU0 (Tasklet: hci_rx_task) CPU1 (user process) > > > > > > > > Can you check if this also happens after the move to workqueue > > > > processing? > > > > The workqueue handling is quite different, then this problem might > > not > > > > be > > > > there anymore. > > > > > > Firstly, I think workqueue should only make the matters worse - > > > since it can be preempted ( unlike tasklets ) this can > > > happen even on single CPU. ) e.g. resched just before send_resp > > label). > > > Secondly, as with any race situations, this bug is difficult to > > reproduce, > > > I saw it only a couple of times, thus I call for theoretical > > analysis. > > > > we are actually using a CPU unbound workqueue where the kernel ensures > > that only one will be active across the set of CPUs. Both RX and TX are > > executed from that same workqueue. So the only way this can happen is > > if > > one work is scheduled from the other. However since the event > > processing > > is now also run from that same workqueue, I fail to see how that could > > happen. > > I am putting back the original diagram because I feel that it is > quite relevant to the discussion: > > CPU0 (Tasklet: hci_rx_task) CPU1 (user process) > ... sk = sys_accept() > ... l2cap_sock_accept() > ... add_wait_queue_exclusive() > l2cap_connect_req() ... > result = L2CAP_CR_PEND; ... > status = L2CAP_CS_AUTHOR_PEND; ... > parent->sk_data_ready(parent, 0) ... Move to the workqueue based code and add a call schedule() here, before send L2CAP_CR_PEND. Let's see if this issue is real. Gustavo -- To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html