James Bottomley wrote:
I'm not sure your conclusions necessarily follow your data. What was
the reason for the TASK ABORTED (I'd guess QErr settings, right)?
It was my desire/curiosity during tests of SCST (http://scst.sf.net),
when it working with several initiators with different transports over
the same set of devices, each of them having with TAS bit in the control
mode page set. According to SAM, in this case TASK ABORTED status can be
returned at any time, similarly to QUEUE FULL, i.e. IMHO such command
just should be retried. But QUEUE FULL status handled well, but TASK
ABORTED leads to filesystem corruption.
So this is with a soft target implementation ... so it could be an
ordering issue inside the target that's causing the filesystem
corruption on error.
Target offers no ordering guarantees for SIMPLE commands and frankly
says that to initiator via QUEUE ALGORITHM MODIFIER value 1 in the
control mode page. As we know, initiator doesn't use ORDERED tags (and
it really doesn't use them according to the logs), so if it's an
ordering issue, it's at the initiator's side.
if you specifically set TAS=1 you're giving up the right to know what
caused the command termination. With insufficient information, it's
really unsafe to simply retry, which is why the mid layer just returns
TASK ABORTED as an error. If you set TAS=0 we'll get a check
condition/unit attention explaining what happened (usually commands
cleared by another initiator) and we'll explicitly do the right thing
based on the sense data.
But having TAS=1 is legal, right? So it should be handled well. If
TAS=0, TASK ABORTED can't be returned, it would be illegal. So, TASK
ABORTED status can only be returned with TAS=1.
Driving with your handbrake on is legal too ... that doesn't mean you
should do it ... and it certainly doesn't give you a legitimate
complaint against the manufacturer of your car for excessive brake pad
wear.
We handle TASK ABORTED as well as we can (by failing it). For better
handling set TAS=0 and we'll handle the individual cases according to
the sense codes.
So, should I consider your words as you think that it's perfectly fine
to corrupt file system for devices with TAS=1? Absolutely legal devices,
repeat. Hence, in your opinion, no further investigation should be done?
Logic wouldn't support such a conclusion.
Sorry, lately I've got too many "I won't bother, this is your problem"
style answers
You have intertwined two issues
1. How should the mid layer handle TASK ABORTED. I think we've
reached the point where returning I/O error is the best we can
do, but if TAS=0 we could have used the sense data to do better.
2. Should a request I/O error cause corruption in ext3 that can't
be recovered by a journal replay. I think the answer here is
no, so there needs to be an easily reproducible test case to
pass to the filesystem people.
OK, I see you point. As I already wrote, I can assist only in testing here.
James
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html