On 01/19/2012 11:39 PM, Andrea Arcangeli wrote: > On Thu, Jan 19, 2012 at 09:52:11PM +0100, Jan Kara wrote: >> anything. So what will be cheaper depends on how often are redirtied pages >> under IO. This is rather rare because pages aren't flushed all that often. >> So the effect of stable pages in not observable on throughput. But you can >> certainly see it on max latency... > > I see your point. A problem with migrate though is that the page must > be pinned by the I/O layer to prevent migration to free the page under > I/O, or how else it could be safe to read from a freed page? And if > the page is pinned migration won't work at all. See page_freeze_refs > in migrate_page_move_mapping. So the pinning issue would need to be > handled somehow. It's needed for example when there's an O_DIRECT > read, and the I/O is going to the page, if the page is migrated in > that case, we'd lose a part of the I/O. Differentiating how many page > pins are ok to be ignored by migration won't be trivial but probably > possible to do. > > Another way maybe would be to detect when there's too much re-dirtying > of pages in flight in a short amount of time, and to start the bounce > buffering and stop waiting, until the re-dirtying stops, and then you > stop the bounce buffering. But unlike migration, it can't prevent an > initial burst of high fault latency... Or just change that RT program that is one - latency bound but, two - does unpredictable, statistically bad, things to a memory mapped file. Can a memory-mapped-file writer have some control on the time of writeback with data_sync or such, or it's purely: Timer fired, Kernel see a dirty page, start a writeout? What about if the application maps a portion of the file at a time, and the Kernel gets more lazy on an active memory mapped region. (That's what windows NT do. It will never IO a mapped section unless in OOM conditions. The application needs to map small sections and unmap to IO. It's more of a direct_io than mmap) In any case, if you are very latency sensitive an mmap writeout is bad for you. Not only because of this new problem, but because mmap writeout can sync with tones of other things, that are do to memory management. (As mentioned by Andrea). The best for latency sensitive application is asynchronous direct-io by far. Only with asynchronous and direct-io you can have any real control on your latency. (I understand they used to have empirically observed latency bound but that is just luck, not real control) BTW: The application mentioned would probably not want it's IO bounced at the block layer, other wise why would it use mmap if not for preventing the copy induced by buffer IO? All that said, a mount option to ext4 (Is ext4 used?) to revert to the old behavior is the easiest solution. When originally we brought this up in LSF my thought was that the block request Q should have some flag that says need_stable_pages. If set by the likes of dm/md-raid, iscsi-with-data-signed, DIFF enabled devices and so on, and the FS does not guaranty/wants stable pages then an IO bounce is set up. But if not set then the like of ext4 need not bother. Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html