Re: uas regression in 3.15 with Lacie Rugged USB3 and NEC uPD720200

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2014-07-16 at 10:44 +0200, Hans de Goede wrote:
> Hi,
> 
> On 07/16/2014 08:45 AM, Oliver Neukum wrote:
> > On Tue, 2014-07-15 at 22:23 +0200, Julian Sikorski wrote:
> >> [   54.017997] sd 6:0:0:0: [sdb] Write cache: enabled, read cache:
> >> enabled, doesn't support DPO or FUA
> >> [   54.018119] sd 6:0:0:0: uas_sense_old: urb length 26 disagrees with
> >> IU sense data length 510, using 18 bytes of sense data
> >> [   85.060805] sd 6:0:0:0: uas_eh_abort_handler ffff8803120b7400 tag
> >> 0,
> >> inflight: IN
> >> [   85.061835] sd 6:0:0:0: abort completed
> >> [   85.062037] ------------[ cut here ]------------
> >> [   85.062191] kernel BUG at block/blk-core.c:2511!
> >> [   85.062328] invalid opcode: 0000 [#1] SMP
> >> [   85.062573] Modules linked in: uas usb_storage
> >> nf_conntrack_netbios_ns nf_conntrack_broadcast ccm ip6t_rpfilter
> >> ip6t_REJECT xt_conntrack bnep bluetooth ebtable_nat ebtable_broute
> >> bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6
> >> nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security
> >> ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4
> >> nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle
> >> iptable_security iptable_raw fuse iTCO_wdt iTCO_vendor_support
> >> x86_pkg_temp_thermal coretemp snd_hda_codec_realtek kvm_intel kvm arc4
> >> snd_hda_codec_generic snd_hda_codec_hdmi crct10dif_pclmul iwldvm
> >> crc32_pclmul crc32c_intel mac80211 sdhci_pci uvcvideo
> >> videobuf2_vmalloc
> >> videobuf2_memops videobuf2_core snd_hda_intel ghash_clmulni_intel
> >> videodev iwlwifi snd_hda_controller
> >> [   85.066723]  media snd_hda_codec sdhci cfg80211 snd_hwdep microcode
> >> jme snd_seq snd_seq_device snd_pcm mii serio_raw mmc_core jmb38x_ms
> >> rfkill memstick mei_me mei snd_timer shpchp lpc_ich i2c_i801 mfd_core
> >> i2c_core snd soundcore nfsd auth_rpcgss nfs_acl lockd sunrpc
> >> binfmt_misc
> >> firewire_ohci firewire_core crc_itu_t wmi video
> >> [   85.069089] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> >> 3.15.4-200.fc20.x86_64+debug #1
> >> [   85.069297] Hardware name: CLEVO
> >> P150HMx/P150HMx, BIOS 4.6.4 08/09/2011
> >> [   85.069509] task: ffffffff81c184c0 ti: ffffffff81c00000 task.ti:
> >> ffffffff81c00000
> >> [   85.069716] RIP: 0010:[<ffffffff813986d0>]  [<ffffffff813986d0>]
> >> blk_finish_request+0xf0/0x100
> >> [   85.069987] RSP: 0018:ffff880321e03da0  EFLAGS: 00010002
> >> [   85.070122] RAX: 0000000000000286 RBX: ffff88031b73d730 RCX:
> >> 0000000000000000
> >> [   85.070261] RDX: ffff880321e0eb20 RSI: 00000000fffffffb RDI:
> >> ffff88031b73d730
> >> [   85.071768] RBP: ffff880321e03db0 R08: 0000000000000000 R09:
> >> 0000000000000001
> >> [   85.071907] R10: ffffffff81c184c0 R11: ffffffff81c18d80 R12:
> >> 00000000fffffffb
> >> [   85.072046] R13: ffff8803190625b0 R14: 0000000000000286 R15:
> >> ffff8803120b7400
> >> [   85.072186] FS:  0000000000000000(0000) GS:ffff880321e00000(0000)
> >> knlGS:0000000000000000
> >> [   85.072395] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [   85.072531] CR2: 00007f6c210209c0 CR3: 0000000001c11000 CR4:
> >> 00000000000407f0
> >> [   85.072669] Stack:
> >> [   85.072794]  ffff88031b73d730 00000000fffffffb ffff880321e03de0
> >> ffffffff81398722
> >> [   85.073205]  ffff88031b73d730 0000000008000000 00000000fffffffb
> >> ffff88031b73d730
> >> [   85.073615]  ffff880321e03df0 ffffffff81398750 ffff880321e03e58
> >> ffffffff815265c1
> >> [   85.074024] Call Trace:
> >> [   85.074151]  <IRQ>
> >> [   85.074210]
> >> [   85.074387]  [<ffffffff81398722>] blk_end_bidi_request+0x42/0x60
> >> [   85.074523]  [<ffffffff81398750>] blk_end_request+0x10/0x20
> >> [   85.074664]  [<ffffffff815265c1>] scsi_io_completion+0x111/0x6a0
> >> [   85.074803]  [<ffffffff8151af03>] scsi_finish_command+0xb3/0x110
> >> [   85.074943]  [<ffffffff815263c7>] scsi_softirq_done+0x137/0x160
> >> [   85.075082]  [<ffffffff8139fe7b>] blk_done_softirq+0x9b/0xd0
> >> [   85.075223]  [<ffffffff8109caad>] __do_softirq+0x12d/0x480
> >> [   85.075361]  [<ffffffff8109d135>] irq_exit+0xc5/0xd0
> >> [   85.075499]  [<ffffffff817f0598>] do_IRQ+0x58/0xf0
> >> [   85.075635]  [<ffffffff817e4432>] common_interrupt+0x72/0x72
> >> [   85.075769]  <EOI>
> >> [   85.075827]
> >> [   85.076006]  [<ffffffff8162e494>] ? cpuidle_enter_state+0x54/0xd0
> >> [   85.076142]  [<ffffffff8162e547>] cpuidle_enter+0x17/0x20
> >> [   85.076280]  [<ffffffff810eef4e>] cpu_startup_entry+0x3ee/0x770
> >> [   85.076420]  [<ffffffff817c914a>] rest_init+0x13a/0x140
> >> [   85.076557]  [<ffffffff817c9015>] ? rest_init+0x5/0x140
> >> [   85.076695]  [<ffffffff81f1c006>] start_kernel+0x48d/0x4ae
> >> [   85.076833]  [<ffffffff81f1b982>] ? repair_env_string+0x5c/0x5c
> >> [   85.076972]  [<ffffffff81f1b120>] ? early_idt_handlers+0x120/0x120
> >> [   85.077112]  [<ffffffff81f1b5ee>] x86_64_start_reservations
> >> +0x2a/0x2c
> >> [   85.077252]  [<ffffffff81f1b73d>] x86_64_start_kernel+0x14d/0x170
> >> [   85.077388] Code: 89 de e8 84 18 00 00 e9 4b ff ff ff 83 7b 78 01
> >> 0f
> >> 85 58 ff ff ff 48 8b 43 60 48 8d b8 78 02 00 00 e8 d5 69 e2 ff e9 43
> >> ff
> >> ff ff <0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
> >> [   85.081015] RIP  [<ffffffff813986d0>] blk_finish_request+0xf0/0x100
> >> [   85.081206]  RSP <ffff880321e03da0>
> > 
> > It looks like UAS is calling scsi_done() twice.
> > 
> > I may be dense, but it looks like uas_data_cmplt() will call
> > uas_try_complete() with only the UNLINK_DATA_URBS flag protecting
> > it from calling scsi_done() twice. As uas_unlink_data_urbs() uses
> > usb_unlink_urb() we are racing withthe completion handler.
> 
> I don't think that is possible. You're right that usb_unlink_urb() does not
> guarantee that the completion handler has actually run when it returns,
> yet we still need the unlock + relock + protection flag because
> 
> "This request is asynchronous, however the HCD might call the ->complete
> callback during unlink. Therefore when drivers call usb_unlink_urb, they
> must not hold any locks ..."
> 
> So yes we have this race where the uas_data_cmplt() and uas_stat_cmplt()
> which calls uas_unlink_data_urbs() run at the same time, but uas_stat_cmplt()
> has retaken the lock (through the re-locking in uas_unlink_data_urbs()) before
> it does cmdinfo->state &= ~COMMAND_INFLIGHT, and likewise uas_data_cmplt
> does cmdinfo->state &= ~DATA_XX_URB_INFLIGHT with the lock, and they both
> call uas_try_complete with the lock held.
> 
> So yes you're right that they both will call uas_try_complete, but only
> one for one of them this will not trigger:
> 
>         if (cmdinfo->state & (COMMAND_INFLIGHT |
>                               DATA_IN_URB_INFLIGHT |
>                               DATA_OUT_URB_INFLIGHT |
>                               	))
>                 return -EBUSY;
> 
> As only for one of them all flags will be cleared, even if the unlink
> results in uas_data_cmplt() running after UNLINK_DATA_URBS has already
> been cleared.
> 
> So as far as I can see the problem you think is there is not there.
> 
> If you've a specific sequence of events in mind which you believe will
> be a problem please spell it out step by step, and then we'll see from
> there.

CPU A					CPU B
URB completed in hardware
uas_stat_cmplt()
uas_try_complete()
scsi_done()
				uas_unlink_data_urbs
				UNLINK_DATA_URBS set
				usb_unlink_urb() [NOP]
				UNLINK_DATA_URBS unset
				uas_try_complete()
				[COMMAND_INFLIGHT not set]
				[DATA_IN_URB_INFLIGHT not set]
				scsi_done()

	Regards
		Oliver




--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux