Re: DIF/DIX updates for 2.6.32

"Martin K. Petersen" <martin.petersen@xxxxxxxxxx> · Thu, 27 Aug 2009 16:02:41 -0400

>>>>> "James" == James Bottomley <James.Bottomley@xxxxxxx> writes:

>> This is not an option on a mirror system, and the performance
>> gain/lose is dependent on the round trip speed. If for every digest
>> error I have an error recovery cycle, delays, and stalls. Then no it
>> is not better. Not to mention some iscsi-targets that reset and the
>> all session must be re-established.

James> Your suggestion of putting processes to sleep while I/O is
James> pending will degrade performance for everyone; that's not really
James> an acceptable tradeoff for improving one corner case.

I disagree with the notion that this is a corner case.

Originally we locked down pages completely during I/O.  Then the page
lock was split, introducing the page writeback bit.  The writeback bit
is set when the I/O is actually issued and cleared upon completion.  So
the page contents only need to be stable during that window.

XFS and btrfs both make use of the writeback bit, waiting for it to be
cleared before reissuing I/O to the same page.  ext[23] (and maybe 4)
don't.  Some of this is poor conversion to the new page cache API, some
of it, I believe, is intentional.

I agree with Boaz' assertion that changing pages in flight is a bad
practice.  It's been kind-of-ok in the single-disk case.  But once we
get into crypto, RAID, iSCSI and DIX/DIF territory things start falling
apart.

We already buffer things in the page cache once.  Having to do
multi-buffering or copy-pages-on-write and reissue I/O because
filesystems engage in dubious practices is crappy.

In my opinion ext[234] should simply be fixed.  If there's a significant
performance hit on those filesystems we could make the wait conditional
on a block_device flag.

-- 
Martin K. Petersen	Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html