Well, good thing I left it running longer. It triggered multiple times over the past week. So I guess it's not fixed with the simple cleanup patches that had been posted previously. I really need some help with scsi debugging to get valid log data out of this, in order to isolate the issue. -L On Tue, May 25, 2010 at 2:58 PM, Leif Sawyer <ak.hepcat+scsi@xxxxxxxxx> wrote: > On Wed, May 19, 2010 at 5:23 AM, Leif Sawyer <ak.hepcat+scsi@xxxxxxxxx> wrote: >> looks like the 75% mark might have been too high of an estimate. >> Whipped up a quick logger to show me when i was failing: >> >> <user.info<14>>May 18 17:08:01 websniff-6036a5 logger: disk: /data at >> 59% utilization >> <user.info<14>>May 18 17:09:01 websniff-6036a5 logger: disk: /data at >> 59% utilization >> <user.info<14>>May 18 17:10:01 websniff-6036a5 logger: disk: /data at >> 60% utilization >> <user.info<14>>May 18 17:11:01 websniff-6036a5 logger: disk: /data at >> 60% utilization >> [22563.204037] INFO: task flush-8:16:2430 blocked for more than 120 seconds. >> [22563.224392] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [22563.248117] INFO: task dumpcap:4004 blocked for more than 120 seconds. >> [22563.267662] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [22563.291359] INFO: task df:14714 blocked for more than 120 seconds. >> [22563.309874] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [22563.333593] INFO: task websniff.cgi:14717 blocked for more than 120 seconds. >> [22563.354690] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [22682.229526] end_request: I/O error, dev sdb, sector 169922743 >> [22682.247345] Buffer I/O error on device sdb1, logical block 21240335 >> [22682.266781] end_request: I/O error, dev sdb, sector 170131519 >> [22682.284445] Buffer I/O error on device sdb1, logical block 21266432 >> [....... repeats until......] >> [22682.782577] sd 3:0:1:0: rejecting I/O to offline device >> [22682.798907] sd 3:0:1:0: rejecting I/O to offline device >> >> And from here on out, the device is no longer recognized by the system >> until a reboot. >> >> I need some help with scsi debugging in order to provide more useful >> information. >> >> I do have a 512mb logfile (text) with lots of scsi dump card state >> logs and such, though. >> > > > Okay, so on a whim, I applied some patches that were recently posted > here that I thought > might have an impact on my particular system (anything generic scsi, > or adaptec-related) > > My system has been up since yesterday with those patches applied, and my disk > has been churning at the 100% utilized (with between 600Mb and 75Mb > free at any given time) > with tshark continuously rolling over new capture files for over 6h. > (which it never did before) > > the following patches were applied which were not cosmetic or debug related: > > lct_data->tid assignment > io_dev->iop assignment > usg use after kfree > gdth goto out_free_ccb_phys instead of out_free_coal_stat > > > If there's interest, i'll back out the patches one at a time and see > which one(s) > cause/bring-back the most instability. > > > > -- > "It's pronounced Layf...you know, like Leif Garrett? Don't you watch > 'I Love the 70's'? What kind of retro lover are you, anyway?" > -- "It's pronounced Layf...you know, like Leif Garrett? Don't you watch 'I Love the 70's'? What kind of retro lover are you, anyway?" -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html