Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction

Nai Xia <nai.xia@xxxxxxxxx> · Wed, 23 Nov 2011 20:20:52 +0800

On Wed, Nov 23, 2011 at 7:39 PM, Jan Kara <jack@xxxxxxx> wrote:
> On Wed 23-11-11 06:44:23, Nai Xia wrote:
>> >> So that amounts to the following calculation that is important to the
>> >> statistical stall time for the compaction:
>> >>
>> >>      page_nr *  average_stall_window_time
>> >>
>> >> where average_stall_window_time is the window for a page between
>> >> NotUptoDate ---> UptoDate or Dirty --> Clean. And page_nr is the
>> >> number of pages in stall window for read or write.
>> >>
>> >> So for general cases,
>> >> Fact 1) may ensure that the page_nr is smaller for read, while
>> >> fact 2) may ensure the same for average_locking_window_time.
>> >  Well, page_nr really depends on the load. If the workload is only reads,
>> > clearly number of read pages is going to be higher than number of written
>> > pages. Once workload does heavy writing, I agree number of pages under
>> > writeback is likely going to be higher.
>>
>> Think about process A linearly scans 100MB mapped file pages
>> area for read, and another process B linearly writes to a same sized area.
>> If there is no readahead, the read page in stall window in memory is only
>> *one* page each time.
>  Yes, I understand this. But in a situation where there is *no* process
> writing and *hundred* processes reading, you clearly have more pages locked
> for reading than for writing. All I wanted to say is that your broad
> statement that the number of pages read from disk is lower than the number
> of pages written is not true in general. It depends on the workload.

OK, I agree with you here. I think I did not make my statement
of "general cases" very clear... I actually meant where reading is comparable to
writing. Yes, considering the variety of workloads, it's surely workload
dependent. Sorry for my vague statement :)

>
>> However, 100MB dirty pages can be hold in memory
>> waiting to be write which may stall the compaction for fallback_migrate_page().
>> Even for buffer_migrate_page() these pages are much more likely to get locked
>> by other behaviors like you said for IO submission,etc.
>>
>> I was not sure about readahead, of course,  I only theoretically
>> expected its still not
>> comparable to the totally async write behavior.
>>
>> >
>> >> I am not sure this will be the same case for all workloads,
>> >> don't know if Mel has tested large readahead workloads which
>> >> has more async read IOs and less writebacks.
>> >>
>> >> But theoretically I expect things are not that bad even for large
>> >> readahead, because readahead is triggered by the readahead TAG in
>> >> linear order, which means for a process to generating readahead IO,
>> >> its speed is still somewhat govened by the read IO speed. While
>> >> for a process writing to a file mapped memory area, it may well
>> >> exceed the speed of its backing-store writing speed.
>> >>
>> >>
>> >> Aside from that, I think the relation between page locking and
>> >> page read is not 1-to-1, in other words, there maybe quite some
>> >> transient page locking is caused by mmap and then page fault into
>> >> already good-state pages requiring no IO at all. For these
>> >> transient page lockings I think it's reasonable to have light
>> >> waiting.
>> >  Definitely there are other lockings than for read. E.g. to write a page,
>> > we lock it first, submit IO (which can actually block waiting for request
>> > to get freed), set PageWriteback, and unlock the page. And there are more
>> > transient ones like you mention above...
>>
>> Yes, you are right.
>> But I think we were talking about distinguishing page locking from page read
>> IO?
>>
>> Well, I might also want to suggest that do an early dirty test before
>> taking the lock...but, I expect page NotUpToDate is much more likely an
>> indication that we are going to block for IO on the following page lock.
>> Dirty test is not that strong. Do you agree ?
>  Yes, I agree with this.
>
>                                                                Honza
> --
> Jan Kara <jack@xxxxxxx>
> SUSE Labs, CR
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href