Re: [RFC PATCH v2 0/1] improve vmap allocation

Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> · Fri, 22 Mar 2019 13:47:53 -0400

On Fri, Mar 22, 2019 at 05:52:59PM +0100, Uladzislau Rezki wrote:
> On Thu, Mar 21, 2019 at 03:01:06PM -0700, Andrew Morton wrote:
> > On Thu, 21 Mar 2019 20:03:26 +0100 "Uladzislau Rezki (Sony)" <urezki@xxxxxxxxx> wrote:
> > 
> > > Hello.
> > > 
> > > This is the v2 of the https://lkml.org/lkml/2018/10/19/786 rework. Instead of
> > > referring you to that link, i will go through it again describing the improved
> > > allocation method and provide changes between v1 and v2 in the end.
> > > 
> > > ...
> > >
> > 
> > > Performance analysis
> > > --------------------
> > 
> > Impressive numbers.  But this is presumably a worst-case microbenchmark.
> > 
> > Are you able to describe the benefits which are observed in some
> > real-world workload which someone cares about?
> > 
> We work with Android. Google uses its own tool called UiBench to measure
> performance of UI. It counts dropped or delayed frames, or as they call it,
> jank. Basically if we deliver 59(should be 60) frames per second then we
> get 1 junk/drop.

Agreed. Strictly speaking, "1 Jank" is not necessarily "1 frame drop". A
delayed frame is also a Jank. Just because a frame is delayed does not mean
it is dropped, there is double buffering etc to absorb delays.

> I see that on our devices avg-jank is lower. In our case Android graphics
> pipeline uses vmalloc allocations which can lead to delays of UI content
> to GPU. But such behavior depends on your platform, parts of the system
> which make use of it and if they are critical to time or not.
> 
> Second example is indirect impact. During analysis of audio glitches
> in high-resolution audio the source of drops were long alloc_vmap_area()
> allocations.
> 
> # Explanation is here
> ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt
> 
> # Audio 10 seconds sample is here.
> # The drop occurs at 00:09.295 you can hear it
> ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav

Nice.

> > It's a lot of new code. I t looks decent and I'll toss it in there for
> > further testing.  Hopefully someone will be able to find the time for a
> > detailed review.
> > 
> Thank you :)

I can try to do a review fwiw. But I am severely buried right now. I did look
at vmalloc code before for similar reasons (preempt off related delays
causing jank / glitches etc). Any case, I'll take another look soon (in next
1-2 weeks).

> > Trivial point: the code uses "inline" a lot.  Nowadays gcc cheerfully
> > ignores that and does its own thing.  You might want to look at the
> > effects of simply deleting all that.  Is the generated code better or
> > worse or the same?  If something really needs to be inlined then use
> > __always_inline, preferably with a comment explaining why it is there.
> > 
> When the main core functionalities are "inlined" i see the benefit. 
> At least, it is noticeable by the "test driver". But i agree that
> i should check one more time to see what can be excluded and used
> as a regular call. Thanks for the hint, it is worth to go with
> __always_inline instead.

I wonder how clang behaves as far as inline hints go. That is how Android
images build their kernels.

thanks,

 - Joel