On Wed, May 19, 2010 at 5:23 AM, Leif Sawyer <ak.hepcat+scsi@xxxxxxxxx> wrote: > looks like the 75% mark might have been too high of an estimate. > Whipped up a quick logger to show me when i was failing: > > <user.info<14>>May 18 17:08:01 websniff-6036a5 logger: disk: /data at > 59% utilization > <user.info<14>>May 18 17:09:01 websniff-6036a5 logger: disk: /data at > 59% utilization > <user.info<14>>May 18 17:10:01 websniff-6036a5 logger: disk: /data at > 60% utilization > <user.info<14>>May 18 17:11:01 websniff-6036a5 logger: disk: /data at > 60% utilization > [22563.204037] INFO: task flush-8:16:2430 blocked for more than 120 seconds. > [22563.224392] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [22563.248117] INFO: task dumpcap:4004 blocked for more than 120 seconds. > [22563.267662] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [22563.291359] INFO: task df:14714 blocked for more than 120 seconds. > [22563.309874] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [22563.333593] INFO: task websniff.cgi:14717 blocked for more than 120 seconds. > [22563.354690] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [22682.229526] end_request: I/O error, dev sdb, sector 169922743 > [22682.247345] Buffer I/O error on device sdb1, logical block 21240335 > [22682.266781] end_request: I/O error, dev sdb, sector 170131519 > [22682.284445] Buffer I/O error on device sdb1, logical block 21266432 > [....... repeats until......] > [22682.782577] sd 3:0:1:0: rejecting I/O to offline device > [22682.798907] sd 3:0:1:0: rejecting I/O to offline device > > And from here on out, the device is no longer recognized by the system > until a reboot. > > I need some help with scsi debugging in order to provide more useful > information. > > I do have a 512mb logfile (text) with lots of scsi dump card state > logs and such, though. > Okay, so on a whim, I applied some patches that were recently posted here that I thought might have an impact on my particular system (anything generic scsi, or adaptec-related) My system has been up since yesterday with those patches applied, and my disk has been churning at the 100% utilized (with between 600Mb and 75Mb free at any given time) with tshark continuously rolling over new capture files for over 6h. (which it never did before) the following patches were applied which were not cosmetic or debug related: lct_data->tid assignment io_dev->iop assignment usg use after kfree gdth goto out_free_ccb_phys instead of out_free_coal_stat If there's interest, i'll back out the patches one at a time and see which one(s) cause/bring-back the most instability. -- "It's pronounced Layf...you know, like Leif Garrett? Don't you watch 'I Love the 70's'? What kind of retro lover are you, anyway?" -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html