Re: [RFC PATCH v2 0/1] improve vmap allocation

Uladzislau Rezki <urezki@xxxxxxxxx> · Fri, 22 Mar 2019 17:52:59 +0100

On Thu, Mar 21, 2019 at 03:01:06PM -0700, Andrew Morton wrote:
> On Thu, 21 Mar 2019 20:03:26 +0100 "Uladzislau Rezki (Sony)" <urezki@xxxxxxxxx> wrote:
> 
> > Hello.
> > 
> > This is the v2 of the https://lkml.org/lkml/2018/10/19/786 rework. Instead of
> > referring you to that link, i will go through it again describing the improved
> > allocation method and provide changes between v1 and v2 in the end.
> > 
> > ...
> >
> 
> > Performance analysis
> > --------------------
> 
> Impressive numbers.  But this is presumably a worst-case microbenchmark.
> 
> Are you able to describe the benefits which are observed in some
> real-world workload which someone cares about?
> 
We work with Android. Google uses its own tool called UiBench to measure
performance of UI. It counts dropped or delayed frames, or as they call it,
jank. Basically if we deliver 59(should be 60) frames per second then we
get 1 junk/drop.

I see that on our devices avg-jank is lower. In our case Android graphics
pipeline uses vmalloc allocations which can lead to delays of UI content
to GPU. But such behavior depends on your platform, parts of the system
which make use of it and if they are critical to time or not.

Second example is indirect impact. During analysis of audio glitches
in high-resolution audio the source of drops were long alloc_vmap_area()
allocations.

# Explanation is here
ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt

# Audio 10 seconds sample is here.
# The drop occurs at 00:09.295 you can hear it
ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav

>
> It's a lot of new code. I t looks decent and I'll toss it in there for
> further testing.  Hopefully someone will be able to find the time for a
> detailed review.
> 
Thank you :)

> Trivial point: the code uses "inline" a lot.  Nowadays gcc cheerfully
> ignores that and does its own thing.  You might want to look at the
> effects of simply deleting all that.  Is the generated code better or
> worse or the same?  If something really needs to be inlined then use
> __always_inline, preferably with a comment explaining why it is there.
> 
When the main core functionalities are "inlined" i see the benefit. 
At least, it is noticeable by the "test driver". But i agree that
i should check one more time to see what can be excluded and used
as a regular call. Thanks for the hint, it is worth to go with
__always_inline instead.

--
Vlad Rezki