> James, am I understanding your suggestion properly? If so can you > explain what you meant about the libsas code -- I see that it has its > own strategy handler but as I said before we've already stopped every > device attached to the HBA before we ever get there. > > To recapitulate the problem here, we might have a whole fabric > attached to an HBA via SAS or FC, and be doing 500K IOPS happily to 50 > devices. Then a single LUN goes wonky and all the IO stops while we > try to recover that single device, which might take minutes. I'm not James, but from my experience in pm8001 and libsas, your understanding is right. and when one error happens on one lun, scsi core do hold the whole scsi host. I think Hannes has some good proposal weeks ago, it looks reasonable, but don't what the status now. Regards Jack Wang -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html