Hi, On 07/16/2014 11:42 AM, Oliver Neukum wrote: > > On Wed, 2014-07-16 at 10:44 +0200, Hans de Goede wrote: >> Hi, >> >> On 07/16/2014 08:45 AM, Oliver Neukum wrote: >>> On Tue, 2014-07-15 at 22:23 +0200, Julian Sikorski wrote: >>>> [ 54.017997] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: >>>> enabled, doesn't support DPO or FUA >>>> [ 54.018119] sd 6:0:0:0: uas_sense_old: urb length 26 disagrees with >>>> IU sense data length 510, using 18 bytes of sense data >>>> [ 85.060805] sd 6:0:0:0: uas_eh_abort_handler ffff8803120b7400 tag >>>> 0, >>>> inflight: IN >>>> [ 85.061835] sd 6:0:0:0: abort completed >>>> [ 85.062037] ------------[ cut here ]------------ >>>> [ 85.062191] kernel BUG at block/blk-core.c:2511! >>>> [ 85.062328] invalid opcode: 0000 [#1] SMP >>>> [ 85.062573] Modules linked in: uas usb_storage >>>> nf_conntrack_netbios_ns nf_conntrack_broadcast ccm ip6t_rpfilter >>>> ip6t_REJECT xt_conntrack bnep bluetooth ebtable_nat ebtable_broute >>>> bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 >>>> nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security >>>> ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 >>>> nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle >>>> iptable_security iptable_raw fuse iTCO_wdt iTCO_vendor_support >>>> x86_pkg_temp_thermal coretemp snd_hda_codec_realtek kvm_intel kvm arc4 >>>> snd_hda_codec_generic snd_hda_codec_hdmi crct10dif_pclmul iwldvm >>>> crc32_pclmul crc32c_intel mac80211 sdhci_pci uvcvideo >>>> videobuf2_vmalloc >>>> videobuf2_memops videobuf2_core snd_hda_intel ghash_clmulni_intel >>>> videodev iwlwifi snd_hda_controller >>>> [ 85.066723] media snd_hda_codec sdhci cfg80211 snd_hwdep microcode >>>> jme snd_seq snd_seq_device snd_pcm mii serio_raw mmc_core jmb38x_ms >>>> rfkill memstick mei_me mei snd_timer shpchp lpc_ich i2c_i801 mfd_core >>>> i2c_core snd soundcore nfsd auth_rpcgss nfs_acl lockd sunrpc >>>> binfmt_misc >>>> firewire_ohci firewire_core crc_itu_t wmi video >>>> [ 85.069089] CPU: 0 PID: 0 Comm: swapper/0 Not tainted >>>> 3.15.4-200.fc20.x86_64+debug #1 >>>> [ 85.069297] Hardware name: CLEVO >>>> P150HMx/P150HMx, BIOS 4.6.4 08/09/2011 >>>> [ 85.069509] task: ffffffff81c184c0 ti: ffffffff81c00000 task.ti: >>>> ffffffff81c00000 >>>> [ 85.069716] RIP: 0010:[<ffffffff813986d0>] [<ffffffff813986d0>] >>>> blk_finish_request+0xf0/0x100 >>>> [ 85.069987] RSP: 0018:ffff880321e03da0 EFLAGS: 00010002 >>>> [ 85.070122] RAX: 0000000000000286 RBX: ffff88031b73d730 RCX: >>>> 0000000000000000 >>>> [ 85.070261] RDX: ffff880321e0eb20 RSI: 00000000fffffffb RDI: >>>> ffff88031b73d730 >>>> [ 85.071768] RBP: ffff880321e03db0 R08: 0000000000000000 R09: >>>> 0000000000000001 >>>> [ 85.071907] R10: ffffffff81c184c0 R11: ffffffff81c18d80 R12: >>>> 00000000fffffffb >>>> [ 85.072046] R13: ffff8803190625b0 R14: 0000000000000286 R15: >>>> ffff8803120b7400 >>>> [ 85.072186] FS: 0000000000000000(0000) GS:ffff880321e00000(0000) >>>> knlGS:0000000000000000 >>>> [ 85.072395] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [ 85.072531] CR2: 00007f6c210209c0 CR3: 0000000001c11000 CR4: >>>> 00000000000407f0 >>>> [ 85.072669] Stack: >>>> [ 85.072794] ffff88031b73d730 00000000fffffffb ffff880321e03de0 >>>> ffffffff81398722 >>>> [ 85.073205] ffff88031b73d730 0000000008000000 00000000fffffffb >>>> ffff88031b73d730 >>>> [ 85.073615] ffff880321e03df0 ffffffff81398750 ffff880321e03e58 >>>> ffffffff815265c1 >>>> [ 85.074024] Call Trace: >>>> [ 85.074151] <IRQ> >>>> [ 85.074210] >>>> [ 85.074387] [<ffffffff81398722>] blk_end_bidi_request+0x42/0x60 >>>> [ 85.074523] [<ffffffff81398750>] blk_end_request+0x10/0x20 >>>> [ 85.074664] [<ffffffff815265c1>] scsi_io_completion+0x111/0x6a0 >>>> [ 85.074803] [<ffffffff8151af03>] scsi_finish_command+0xb3/0x110 >>>> [ 85.074943] [<ffffffff815263c7>] scsi_softirq_done+0x137/0x160 >>>> [ 85.075082] [<ffffffff8139fe7b>] blk_done_softirq+0x9b/0xd0 >>>> [ 85.075223] [<ffffffff8109caad>] __do_softirq+0x12d/0x480 >>>> [ 85.075361] [<ffffffff8109d135>] irq_exit+0xc5/0xd0 >>>> [ 85.075499] [<ffffffff817f0598>] do_IRQ+0x58/0xf0 >>>> [ 85.075635] [<ffffffff817e4432>] common_interrupt+0x72/0x72 >>>> [ 85.075769] <EOI> >>>> [ 85.075827] >>>> [ 85.076006] [<ffffffff8162e494>] ? cpuidle_enter_state+0x54/0xd0 >>>> [ 85.076142] [<ffffffff8162e547>] cpuidle_enter+0x17/0x20 >>>> [ 85.076280] [<ffffffff810eef4e>] cpu_startup_entry+0x3ee/0x770 >>>> [ 85.076420] [<ffffffff817c914a>] rest_init+0x13a/0x140 >>>> [ 85.076557] [<ffffffff817c9015>] ? rest_init+0x5/0x140 >>>> [ 85.076695] [<ffffffff81f1c006>] start_kernel+0x48d/0x4ae >>>> [ 85.076833] [<ffffffff81f1b982>] ? repair_env_string+0x5c/0x5c >>>> [ 85.076972] [<ffffffff81f1b120>] ? early_idt_handlers+0x120/0x120 >>>> [ 85.077112] [<ffffffff81f1b5ee>] x86_64_start_reservations >>>> +0x2a/0x2c >>>> [ 85.077252] [<ffffffff81f1b73d>] x86_64_start_kernel+0x14d/0x170 >>>> [ 85.077388] Code: 89 de e8 84 18 00 00 e9 4b ff ff ff 83 7b 78 01 >>>> 0f >>>> 85 58 ff ff ff 48 8b 43 60 48 8d b8 78 02 00 00 e8 d5 69 e2 ff e9 43 >>>> ff >>>> ff ff <0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 >>>> [ 85.081015] RIP [<ffffffff813986d0>] blk_finish_request+0xf0/0x100 >>>> [ 85.081206] RSP <ffff880321e03da0> >>> >>> It looks like UAS is calling scsi_done() twice. >>> >>> I may be dense, but it looks like uas_data_cmplt() will call >>> uas_try_complete() with only the UNLINK_DATA_URBS flag protecting >>> it from calling scsi_done() twice. As uas_unlink_data_urbs() uses >>> usb_unlink_urb() we are racing withthe completion handler. >> >> I don't think that is possible. You're right that usb_unlink_urb() does not >> guarantee that the completion handler has actually run when it returns, >> yet we still need the unlock + relock + protection flag because >> >> "This request is asynchronous, however the HCD might call the ->complete >> callback during unlink. Therefore when drivers call usb_unlink_urb, they >> must not hold any locks ..." >> >> So yes we have this race where the uas_data_cmplt() and uas_stat_cmplt() >> which calls uas_unlink_data_urbs() run at the same time, but uas_stat_cmplt() >> has retaken the lock (through the re-locking in uas_unlink_data_urbs()) before >> it does cmdinfo->state &= ~COMMAND_INFLIGHT, and likewise uas_data_cmplt >> does cmdinfo->state &= ~DATA_XX_URB_INFLIGHT with the lock, and they both >> call uas_try_complete with the lock held. >> >> So yes you're right that they both will call uas_try_complete, but only >> one for one of them this will not trigger: >> >> if (cmdinfo->state & (COMMAND_INFLIGHT | >> DATA_IN_URB_INFLIGHT | >> DATA_OUT_URB_INFLIGHT | >> )) >> return -EBUSY; >> >> As only for one of them all flags will be cleared, even if the unlink >> results in uas_data_cmplt() running after UNLINK_DATA_URBS has already >> been cleared. >> >> So as far as I can see the problem you think is there is not there. >> >> If you've a specific sequence of events in mind which you believe will >> be a problem please spell it out step by step, and then we'll see from >> there. > > CPU A CPU B > URB completed in hardware > uas_stat_cmplt() > uas_try_complete() If there are data-urbs which still need to comlete, then scsi_done won't get called. > scsi_done() > uas_unlink_data_urbs How do we get here, from which urb-completion handler does this path get called? It looks like a code path from urb_stat_cmplt, but if CPU a has already done that, then how will B also run it, that makes no sense, the stat urb will complete only once. > UNLINK_DATA_URBS set > usb_unlink_urb() [NOP] > UNLINK_DATA_URBS unset > uas_try_complete() > [COMMAND_INFLIGHT not set] > [DATA_IN_URB_INFLIGHT not set] > scsi_done() Regards, Hans -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html