Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



More than ever, I am convinced that it is actually a hardware problem, but
I am curious for the opinions of both of you on whether the "system"
(meaning, I guess, the combination of usb-storage driver and raid) is
really doing the best with what it has.

My last effort was to switch to a different computer. When I did, I got in
the dmesg log (unfortunately, not preserved, although I should be able to
recreate) that one of the flash drives had bad blocks. Some part of the
system eventually decided it was a "dead device" (I believe dmesg indicate
the scsi subsystem said so). The device (it happened to be /dev/sdc) was
peremptorially dropped from the system. This appears to be what hanged the
raid system.

(Why these messages never appeared on the other computer is beyond me;
obviously some difference in how the actual USB controller reports errors,
but, as I said, I've never studied USB drivers or hardware. In fact, once
you get beyond the UARTs you are getting sophisticated to me)

I've built an array of five known-good devices and so far it works
swimmingly (at least on the hardware that was better at error reporting).

So it seems to me that there is probably nothing actually wrong with the
drivers or their interactions at it leaves me only asking if there should
be some sort of improvement in error reporting/recovery up to userland.

If I am right and the scsi system was marking a device as dead, shouldn't
the userland read against the md device get an error instead of an
indefinite hang?

Beyond this question which I leave to you (although I'd love to hear your
answers/thoughts), I think we can safely say that the problem was hardware
(even if hard to find). If either of you would like, I'd be happy to find
time this week to recreate the error on my "better" PC and send that
along.

As for rolling a custom kernel with more message buffer, well, I'm going
to be getting into a new device driver in the coming months, so a custom
debug kernel is definitely in my future, but I'm not sure when.

I must say, the kernel has become a much more complex beastie since 2.2.x!
(Although it also appears to be improved and somewhat more organized --
but definitely MUCH larger!)

Thank you both so much! I wouldn't even have diagnosed my hardware problem
without your prompts. I'm very grateful. Let me know if you'd like those
dmesg logs or if you'd just like to let it go!

-- 
Michael Schwarz

> On Sunday March 18, mschwarz@xxxxxxxxxxxxx wrote:
>> cp -rv /mnt/* fs2d2/
>>
>> At this point, the process hangs. So I ran:
>>
>> echo t > /proc/sysrq-trigger
>> dmesg > dmesg-5-hungread.log
>
> Unfortunate (as you say) the whole trace doesn't fit.
> Could you try compiling the kernel with a larger value for
> CONFIG_LOG_BUF_SHIFT ??  It looks like you have 17.  21 is the max.
> 19 should probably be sufficient.
>
> Two things look a bit odd.
> 1/ hald-addon-st (process 3974) seems to be hung doing a
>   'test_unit_ready' after a media-changed signal.  Any idea why?
>   Could you try killing of hald while running the test?
>
> 2/ one usb-storage thread (3667) appears to be waiting for
>   IO to complete (though that is just a guess really).
>
> Maybe usb-storage is waiting for the hald test-unit-ready?
>
> But I'm a bit out of my depth here, so I'll leave it to the USB
> experts.
>
> NeilBrown
>
>  =======================
> hald-addon-st D EF9FBD00  2812  3974   2935          3977  3966 (NOTLB)
>        ef9fbd14 00000086 00000002 ef9fbd00 ef9fbcfc 00000000 00000000
> ed4fcbe4
>        c04dc5cc 00000086 0000000a ed407770 c06fb480 18f88700 00000206
> 00000000
>        ed40787c c1c8c480 00000000 ebe7adc0 001d605d db30e9c8 00000096
> ffffffff
> Call Trace:
>  [<c04dc5cc>] elv_next_request+0xfe/0x1ac
>  [<c061e701>] wait_for_completion+0x73/0x98
>  [<c04226ab>] default_wake_function+0x0/0xc
>  [<c04df415>] blk_execute_rq+0xcf/0xe5
>  [<c04de74f>] blk_end_sync_rq+0x0/0x23
>  [<c04dbdf0>] elv_set_request+0x14/0x22
>  [<c04decda>] get_request+0x205/0x2b2
>  [<c04df4e7>] get_request_wait+0x26/0x16c
>  [<f8de1116>] scsi_execute+0xc6/0xd9 [scsi_mod]
>  [<f8de11e0>] scsi_execute_req+0xb7/0xd5 [scsi_mod]
>  [<f8de1241>] scsi_test_unit_ready+0x43/0x80 [scsi_mod]
>  [<f8d726a5>] sd_media_changed+0x60/0xb5 [sd_mod]
>  [<c04e8c82>] kobject_get+0xf/0x13
>  [<c0491481>] check_disk_change+0x16/0x5c
>  [<c055890a>] class_device_get+0xe/0x14
>  [<f8d72b70>] sd_open+0x92/0x120 [sd_mod]
>  [<c04e14cc>] exact_match+0x0/0x4
>  [<c0491b65>] do_open+0x19f/0x255
>  [<c0491d8e>] blkdev_open+0x0/0x4d
>  [<c0491db3>] blkdev_open+0x25/0x4d
>  [<c0470cac>] __dentry_open+0xc3/0x17a
>  [<c0470ddd>] nameidata_to_filp+0x24/0x33
>  [<c0470e1e>] do_filp_open+0x32/0x39
>  [<c061f0e0>] do_nanosleep+0x42/0x66
>  [<c0470bdf>] get_unused_fd+0xb3/0xbd
>  [<c0470e67>] do_sys_open+0x42/0xbe
>  [<c0470f1c>] sys_open+0x1c/0x1e
>  [<c0403f64>] syscall_call+0x7/0xb
>  =======================
> usb-storage   S 00000010  3048  3667      7          3669  3666 (L-TLB)
>        ebcaee78 00000046 f88459c0 00000010 ebc6b7dc f6de08e4 c0587c0e
> 00000010
>        00000000 c06fb480 0000000a ed5f2bb0 d80fa9b0 e8b0e880 00000205
> 00000000
>        ed5f2cbc c1c8c480 00000000 ebe7a9c0 001d5d31 00000205 00000000
> ffffffff
> Call Trace:
>  [<c0587c0e>] usb_hcd_submit_urb+0x6cd/0x773
>  [<c061ecc2>] schedule_timeout+0x13/0x8d
>  [<c061e925>] wait_for_completion_interruptible_timeout+0x99/0xd5
>  [<c04226ab>] default_wake_function+0x0/0xc
>  [<f8db090c>] usb_stor_msg_common+0xc9/0xe8 [usb_storage]
>  [<f8db0d5f>] usb_stor_bulk_transfer_buf+0x61/0x98 [usb_storage]
>  [<f8db12a9>] usb_stor_Bulk_transport+0xcb/0x221 [usb_storage]
>  [<f8db2022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage]
>  [<f8db1414>] usb_stor_invoke_transport+0x15/0x259 [usb_storage]
>  [<c061fa40>] __down_interruptible+0xde/0xf0
>  [<c04226ab>] default_wake_function+0x0/0xc
>  [<f8db2022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage]
>  [<f8db214a>] usb_stor_control_thread+0x128/0x1a3 [usb_storage]
>  [<c0420a03>] complete+0x39/0x48
>  [<f8db2022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage]
>  [<c043779f>] kthread+0xb0/0xd9
>  [<c04376ef>] kthread+0x0/0xd9
>  [<c0404b33>] kernel_thread_helper+0x7/0x10
>  =======================
>

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux