Re: Downsides to madvise/fadvise(willneed) for application startup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 07, 2010 at 10:54:58AM +0800, Taras Glek wrote:
> On 04/06/2010 07:24 PM, Wu Fengguang wrote:
> > Hi Taras,
> >
> > On Tue, Apr 06, 2010 at 05:51:35PM +0800, Johannes Weiner wrote:
> >    
> >> On Mon, Apr 05, 2010 at 03:43:02PM -0700, Taras Glek wrote:
> >>      
> >>> Hello,
> >>> I am working on improving Mozilla startup times. It turns out that page
> >>> faults(caused by lack of cooperation between user/kernelspace) are the
> >>> main cause of slow startup. I need some insights from someone who
> >>> understands linux vm behavior.
> >>>        
> > How about improve Fedora (and other distros) to preload Mozilla (and
> > other apps the user run at the previous boot) with fadvise() at boot
> > time? This sounds like the most reasonable option.
> >    
> That's a slightly different usecase. I'd rather have all large apps 
> startup as efficiently as possible without any hacks. Though until we 
> get there, we'll be using all of the hacks we can.

Boot time user space readahead can do better than kernel heuristic
readahead in several ways:

- it can collect better knowledge on which files/pages will be used
  which lead to high readahead hit ratio and less cache consumption

- it can submit readahead requests for many files in parallel,
  which enables queuing (elevator, NCQ etc.) optimizations

So I won't call it dirty hack :)

> > As for the kernel readahead, I have a patchset to increase default
> > mmap read-around size from 128kb to 512kb (except for small memory
> > systems).  This should help your case as well.
> >    
> Yes. Is the current readahead really doing read-around(ie does it read 
> pages before the one being faulted)? From what I've seen, having the 

Sure. It will do read-around from current fault offset - 64kb to +64kb.

> dynamic linker read binary sections backwards causes faults.
> http://sourceware.org/bugzilla/show_bug.cgi?id=11447

There are too many data in
http://people.mozilla.com/~tglek/startup/systemtap_graphs/ld_bug/report.txt
Can you show me the relevant lines? (wondering if I can ever find such lines..)

> >    
> >>> Current Situation:
> >>> The dynamic linker mmap()s  executable and data sections of our
> >>> executable but it doesn't call madvise().
> >>> By default page faults trigger 131072byte reads. To make matters worse,
> >>> the compile-time linker + gcc lay out code in a manner that does not
> >>> correspond to how the resulting executable will be executed(ie the
> >>> layout is basically random). This means that during startup 15-40mb
> >>> binaries are read in basically random fashion. Even if one orders the
> >>> binary optimally, throughput is still suboptimal due to the puny readahead.
> >>>
> >>> IO Hints:
> >>> Fortunately when one specifies madvise(WILLNEED) pagefaults trigger 2mb
> >>> reads and a binary that tends to take 110 page faults(ie program stops
> >>> execution and waits for disk) can be reduced down to 6. This has the
> >>> potential to double application startup of large apps without any clear
> >>> downsides.
> >>>
> >>> Suse ships their glibc with a dynamic linker patch to fadvise()
> >>> dynamic libraries(not sure why they switched from doing madvise
> >>> before).
> >>>        
> > This is interesting. I wonder how SuSE implements the policy.
> > Do you have the patch or some strace output that demonstrates the
> > fadvise() call?
> >    
> glibc-2.3.90-ld.so-madvise.diff in 
> http://www.rpmseek.com/rpm/glibc-2.4-31.12.3.src.html?hl=com&cba=0:G:0:3732595:0:15:0: 

550 Can't open
/pub/linux/distributions/suse/pub/suse/update/10.1/rpm/src/glibc-2.4-31.12.3.src.rpm:
No such file or directory

OK I give up.

> As I recall they just fadvise the filedescriptor before accessing it.

Obviously this is a bit risky for small memory systems..

> >>> I filed a glibc bug about this at
> >>> http://sourceware.org/bugzilla/show_bug.cgi?id=11431 . Uli commented
> >>> with his concern about wasting memory resources. What is the impact of
> >>> madvise(WILLNEED) or the fadvise equivalent on systems under memory
> >>> pressure? Does the kernel simply start ignoring these hints?
> >>>        
> >> It will throttle based on memory pressure.  In idle situations it will
> >> eat your file cache, however, to satisfy the request.
> >>
> >> Now, the file cache should be much bigger than the amount of unneeded
> >> pages you prefault with the hint over the whole library, so I guess the
> >> benefit of prefaulting the right pages outweighs the downside of evicting
> >> some cache for unused library pages.
> >>
> >> Still, it's a workaround for deficits in the demand-paging/readahead
> >> heuristics and thus a bit ugly, I feel.  Maybe Wu can help.
> >>      
> > Program page faults are inherently random, so the straightforward
> > solution would be to increase the mmap read-around size (for desktops
> > with reasonable large memory), rather than to improve program layout
> > or readahead heuristics :)
> >    
> Program page faults may exhibit random behavior once they've started.

Right.

> During startup page-in pattern of over-engineered OO applications is 
> very predictable. Programs are laid out based on compilation units, 
> which have no relation to how they are executed. Another problem is that 
> any large old application will have lots of code that is either rarely 
> executed or completely dead. Random sprinkling of live code among mostly 
> unneeded code is a problem.

Agreed.

> I'm able to reduce startup pagefaults by 2.5x and mem usage by a few MB 
> with proper binary layout. Even if one lays out a program wrongly, the 
> worst-case pagein pattern will be pretty similar to what it is by default.

That's great. When will we enjoy your research fruits? :)

> But yes, I completely agree that it would be awesome to increase the 
> readahead size proportionally to available memory. It's a little silly 
> to be reading tens of megabytes in 128kb increments :)  You rock for 
> trying to modernize this.

Thank you. I guess the 128kb is more than ten years old..

Cheers,
Fengguang

> >    
> >>> Also, once an application is started is it reasonable to keep it
> >>> madvise(WILLNEED)ed or should the madvise flags be reset?
> >>>        
> >> It's a one-time operation that starts immediate readahead, no permanent
> >> changes are done.
> >>      
> > Right. The kernel regard WILLNEED as a readahead request from userspace.
> >
> >    
> >>> Perhaps the kernel could monitor the page-in patterns to increase the
> >>> readahead sizes? This may already happen, I've noticed that a handful of
> >>> pagefaults trigger>  131072bytes of IO, perhaps this just needs tweaking.
> >>>        
> >> CCd the man :-)
> >>      
> > Thank you :)
> >
> > Cheers,
> > Fengguang
> >    
> 
> Cheers,
> Taras

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]