On Sun 22-01-12 13:31:38, Boaz Harrosh wrote: > On 01/19/2012 11:39 PM, Andrea Arcangeli wrote: > > On Thu, Jan 19, 2012 at 09:52:11PM +0100, Jan Kara wrote: > >> anything. So what will be cheaper depends on how often are redirtied pages > >> under IO. This is rather rare because pages aren't flushed all that often. > >> So the effect of stable pages in not observable on throughput. But you can > >> certainly see it on max latency... > > > > I see your point. A problem with migrate though is that the page must > > be pinned by the I/O layer to prevent migration to free the page under > > I/O, or how else it could be safe to read from a freed page? And if > > the page is pinned migration won't work at all. See page_freeze_refs > > in migrate_page_move_mapping. So the pinning issue would need to be > > handled somehow. It's needed for example when there's an O_DIRECT > > read, and the I/O is going to the page, if the page is migrated in > > that case, we'd lose a part of the I/O. Differentiating how many page > > pins are ok to be ignored by migration won't be trivial but probably > > possible to do. > > > > Another way maybe would be to detect when there's too much re-dirtying > > of pages in flight in a short amount of time, and to start the bounce > > buffering and stop waiting, until the re-dirtying stops, and then you > > stop the bounce buffering. But unlike migration, it can't prevent an > > initial burst of high fault latency... > > Or just change that RT program that is one - latency bound but, two - does > unpredictable, statistically bad, things to a memory mapped file. Right. That's what I told the RT guy as well :) But he didn't like to hear that because it meant more coding for him. > Can a memory-mapped-file writer have some control on the time of > writeback with data_sync or such, or it's purely: Timer fired, Kernel see > a dirty page, start a writeout? What about if the application maps a > portion of the file at a time, and the Kernel gets more lazy on an active > memory mapped region. (That's what windows NT do. It will never IO a mapped > section unless in OOM conditions. The application needs to map small sections > and unmap to IO. It's more of a direct_io than mmap) You can always start writeback by sync_file_range() but you have no guarantees what writeback does. Also if you need to redirty the page pernamently (e.g. it's a head of your transaction log), there's simply no good time when it can be written when you also want stable pages. > In any case, if you are very latency sensitive an mmap writeout is bad for > you. Not only because of this new problem, but because mmap writeout can > sync with tones of other things, that are do to memory management. (As mentioned > by Andrea). The best for latency sensitive application is asynchronous direct-io > by far. Only with asynchronous and direct-io you can have any real control on > your latency. (I understand they used to have empirically observed latency bound > but that is just luck, not real control) > > BTW: The application mentioned would probably not want it's IO bounced at > the block layer, other wise why would it use mmap if not for preventing > the copy induced by buffer IO? Yeah, I'm not sure why their design was as it was. > All that said, a mount option to ext4 (Is ext4 used?) to revert to the old > behavior is the easiest solution. When originally we brought this up in LSF > my thought was that the block request Q should have some flag that says > need_stable_pages. If set by the likes of dm/md-raid, iscsi-with-data-signed, DIFF > enabled devices and so on, and the FS does not guaranty/wants stable pages > then an IO bounce is set up. But if not set then the like of ext4 need not > bother. There's no mount option. The behavior is on unconditionally. And so far I have not seen enough people complain to introduce something like that - automatic logic is a different thing of course. That might be nice to have. Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel