Re: Fwd: high volume of disk-writes causes disk to 'disappear'

Leif Sawyer <ak.hepcat+scsi@xxxxxxxxx> · Tue, 25 May 2010 14:58:34 -0800

On Wed, May 19, 2010 at 5:23 AM, Leif Sawyer <ak.hepcat+scsi@xxxxxxxxx> wrote:
> looks like the 75% mark might have been too high of an estimate.
> Whipped up a quick logger to show me when i was failing:
>
> <user.info<14>>May 18 17:08:01 websniff-6036a5 logger: disk: /data at
> 59% utilization
> <user.info<14>>May 18 17:09:01 websniff-6036a5 logger: disk: /data at
> 59% utilization
> <user.info<14>>May 18 17:10:01 websniff-6036a5 logger: disk: /data at
> 60% utilization
> <user.info<14>>May 18 17:11:01 websniff-6036a5 logger: disk: /data at
> 60% utilization
> [22563.204037] INFO: task flush-8:16:2430 blocked for more than 120 seconds.
> [22563.224392] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [22563.248117] INFO: task dumpcap:4004 blocked for more than 120 seconds.
> [22563.267662] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [22563.291359] INFO: task df:14714 blocked for more than 120 seconds.
> [22563.309874] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [22563.333593] INFO: task websniff.cgi:14717 blocked for more than 120 seconds.
> [22563.354690] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [22682.229526] end_request: I/O error, dev sdb, sector 169922743
> [22682.247345] Buffer I/O error on device sdb1, logical block 21240335
> [22682.266781] end_request: I/O error, dev sdb, sector 170131519
> [22682.284445] Buffer I/O error on device sdb1, logical block 21266432
> [....... repeats until......]
> [22682.782577] sd 3:0:1:0: rejecting I/O to offline device
> [22682.798907] sd 3:0:1:0: rejecting I/O to offline device
>
> And from here on out, the device is no longer recognized by the system
> until a reboot.
>
> I need some help with scsi debugging in order to provide more useful
> information.
>
> I do have a 512mb logfile (text)  with lots of scsi dump card state
> logs and such, though.
>

Okay, so on a whim, I applied some patches that were recently posted
here that I thought
might have an impact on my particular system (anything generic scsi,
or adaptec-related)

My system has been up since yesterday with those patches applied, and my disk
has been churning at the 100% utilized (with between 600Mb and 75Mb
free at any given time)
with tshark continuously rolling over new capture files  for over 6h.
(which it never did before)

the following patches were applied which were not cosmetic or debug related:

     lct_data->tid assignment
     io_dev->iop assignment
     usg use after kfree
     gdth  goto out_free_ccb_phys  instead of  out_free_coal_stat

If there's interest, i'll back out the patches one at a time and see
which one(s)
cause/bring-back the most instability.

-- 
"It's pronounced Layf...you know, like Leif Garrett? Don't you watch
 'I Love the 70's'? What kind of retro lover are you, anyway?"
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html