>>>>> "James" == James Bottomley <James.Bottomley@xxxxxxx> writes: >> This is not an option on a mirror system, and the performance >> gain/lose is dependent on the round trip speed. If for every digest >> error I have an error recovery cycle, delays, and stalls. Then no it >> is not better. Not to mention some iscsi-targets that reset and the >> all session must be re-established. James> Your suggestion of putting processes to sleep while I/O is James> pending will degrade performance for everyone; that's not really James> an acceptable tradeoff for improving one corner case. I disagree with the notion that this is a corner case. Originally we locked down pages completely during I/O. Then the page lock was split, introducing the page writeback bit. The writeback bit is set when the I/O is actually issued and cleared upon completion. So the page contents only need to be stable during that window. XFS and btrfs both make use of the writeback bit, waiting for it to be cleared before reissuing I/O to the same page. ext[23] (and maybe 4) don't. Some of this is poor conversion to the new page cache API, some of it, I believe, is intentional. I agree with Boaz' assertion that changing pages in flight is a bad practice. It's been kind-of-ok in the single-disk case. But once we get into crypto, RAID, iSCSI and DIX/DIF territory things start falling apart. We already buffer things in the page cache once. Having to do multi-buffering or copy-pages-on-write and reissue I/O because filesystems engage in dubious practices is crappy. In my opinion ext[234] should simply be fixed. If there's a significant performance hit on those filesystems we could make the wait conditional on a block_device flag. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html