Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



James Bottomley wrote:
On Tue, 2007-11-20 at 18:04 +0300, Vladislav Bolkhovitin wrote:

James Bottomley wrote:

And please close this as invalid.  FS ordering guarantees in linux
aren't done via ordered tags.

I had a related question. I was working on the attached patch for soe other testing (patch made against scsi-rc-fixes, but is not stable so do not apply), which does the scsi_populate_tag_msg conversion from MSG_* to ISCSI_ATTR and sets the proper iscsi bits.

If I do this patch where I call scsi_activate_tcq on a device and that concertsion, does this require that my driver not reorder commands? I was just a little worried on some of the error handling paths where we requeue commands to the mid layer.

Right, there's no way of guaranteeing that commands aren't reordered in
the error path (or even the queue full submission path) which is why we
don't use ordered tags to enforce barriers.

May I make your answer more precise? SCSI for non-caching and write-through caching devices provides a way to guarantee order of commands on the error path via ACA and UA_INTLCK facilities, if they are supported by device. For write-back caching devices it's different, because cache may reorder commands after they are reported as completed to the initiator as well as there is a possibility for deferred errors.

Yes, I know this.  The problem is that because we can't rely on the
ordering guarantees in *every* situation, it's unsafe to rely on them
for barrier support (the case you most need them is the one where the
guarantees have likely failed).  Thus, linux fs on SCSI implement
barriers by waiting for completions.  The only case we could implement
flush barriers in SCSI, as they do in IDE is in the single outstanding
command case where we don't have any reordering to worry about (i.e.
queue depth of one).

...if we are going to work only with devices with write-back cache only or not supporting ACA/UA_INTLCK facilities. It might be well possible that some hypothetic SCSI device with write-through cache (WCE bit is 0 or set to 0), ACA/UA_INTLCK and ORDERED commands support would perform considerebly better with barriers by ORDERED tags, than with barriers by waiting for completions and write-back cache, especially for file systems like XFS, because with barriers by ORDERED tags it is possible to keep SCSI tarnsport wire pipe full, where it has to be drained with barriers by waiting for completions. But, since AFAIK the majority of SCSI disks don't support ACA/UA_INTLCK, I have to agree with you, there is not much point currently to implement barriers by ORDERED tags in the SCSI ML.

So, there is no way to guarantee commands order in case of errors, because Linux doesn't implement that.

BTW, there is still something wrong in the SCSI/block/FS layers error processing. Playing with my SCSI target I've noticed that if it returns pretty valid TASK ABORTED status for some SCSI command, FS on initiator (ext3) immediately gets corrupted and journal replay on remount doesn't repair it, only manual e2fsck helps. So, apparently:

1. SCSI ML handles well not all status codes, which it should.


It certainly handles TASK ABORTED.


2. Block/FS levels (sometimes) don't handle I/O errors well enough without corrupting file systems.


I'm not sure your conclusions necessarily follow your data.  What was
the reason for the TASK ABORTED (I'd guess QErr settings, right)?

It was my desire/curiosity during tests of SCST (http://scst.sf.net), when it working with several initiators with different transports over the same set of devices, each of them having with TAS bit in the control mode page set. According to SAM, in this case TASK ABORTED status can be returned at any time, similarly to QUEUE FULL, i.e. IMHO such command just should be retried. But QUEUE FULL status handled well, but TASK ABORTED leads to filesystem corruption.

Journals can fail to recover in cases where the underlying medium is
corrupted.  If TASK ABORTED was because of QErr, what was the original
failure?

See above. No "medium" corruption happened.

Also, what was going on in the system (and what device was this ...
iSCSI I guess) ...

It doesn't matter. It happens with FC transport as well.

I assume nothing powered down, so it's not a caching
problem (and that, since you seem to be using TCQ you do have your
caches set to write through).

The target stays pretty well and healthy.

I don't have time for further investigations, but, if somebody prepare a patch to fix that, I'm willing to assist in testing.

We'll need a bit more data to identify an actual root cause for this
problem before anyone can prepare a patch to fix it.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux