Re: [Lsf-pc] [LSF/MM TOPIC] a few storage topics

Jan Kara <jack@xxxxxxx> · Mon, 23 Jan 2012 17:30:24 +0100



On Sun 22-01-12 13:31:38, Boaz Harrosh wrote:
> On 01/19/2012 11:39 PM, Andrea Arcangeli wrote:
> > On Thu, Jan 19, 2012 at 09:52:11PM +0100, Jan Kara wrote:
> >> anything. So what will be cheaper depends on how often are redirtied pages
> >> under IO. This is rather rare because pages aren't flushed all that often.
> >> So the effect of stable pages in not observable on throughput. But you can
> >> certainly see it on max latency...
> > 
> > I see your point. A problem with migrate though is that the page must
> > be pinned by the I/O layer to prevent migration to free the page under
> > I/O, or how else it could be safe to read from a freed page? And if
> > the page is pinned migration won't work at all. See page_freeze_refs
> > in migrate_page_move_mapping. So the pinning issue would need to be
> > handled somehow. It's needed for example when there's an O_DIRECT
> > read, and the I/O is going to the page, if the page is migrated in
> > that case, we'd lose a part of the I/O. Differentiating how many page
> > pins are ok to be ignored by migration won't be trivial but probably
> > possible to do.
> > 
> > Another way maybe would be to detect when there's too much re-dirtying
> > of pages in flight in a short amount of time, and to start the bounce
> > buffering and stop waiting, until the re-dirtying stops, and then you
> > stop the bounce buffering. But unlike migration, it can't prevent an
> > initial burst of high fault latency...
> 
> Or just change that RT program that is one - latency bound but, two - does
> unpredictable, statistically bad, things to a memory mapped file.
  Right. That's what I told the RT guy as well :) But he didn't like to
hear that because it meant more coding for him.

> Can a memory-mapped-file writer have some control on the time of
> writeback with data_sync or such, or it's purely: Timer fired, Kernel see
> a dirty page, start a writeout? What about if the application maps a
> portion of the file at a time, and the Kernel gets more lazy on an active
> memory mapped region. (That's what windows NT do. It will never IO a mapped
> section unless in OOM conditions. The application needs to map small sections
> and unmap to IO. It's more of a direct_io than mmap)
  You can always start writeback by sync_file_range() but you have no
guarantees what writeback does. Also if you need to redirty the page
pernamently (e.g. it's a head of your transaction log), there's simply no
good time when it can be written when you also want stable pages.

> In any case, if you are very latency sensitive an mmap writeout is bad for
> you. Not only because of this new problem, but because mmap writeout can
> sync with tones of other things, that are do to memory management. (As mentioned
> by Andrea). The best for latency sensitive application is asynchronous direct-io
> by far. Only with asynchronous and direct-io you can have any real control on
> your latency. (I understand they used to have empirically observed latency bound
> but that is just luck, not real control)
> 
> BTW: The application mentioned would probably not want it's IO bounced at
> the block layer, other wise why would it use mmap if not for preventing
> the copy induced by buffer IO?
  Yeah, I'm not sure why their design was as it was.

> All that said, a mount option to ext4 (Is ext4 used?) to revert to the old
> behavior is the easiest solution. When originally we brought this up in LSF
> my thought was that the block request Q should have some flag that says
> need_stable_pages. If set by the likes of dm/md-raid, iscsi-with-data-signed, DIFF
> enabled devices and so on, and the FS does not guaranty/wants stable pages
> then an IO bounce is set up. But if not set then the like of ext4 need not
> bother.
  There's no mount option. The behavior is on unconditionally. And so far I
have not seen enough people complain to introduce something like that -
automatic logic is a different thing of course. That might be nice to have.

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel