On Fri, Mar 22, 2019 at 05:52:59PM +0100, Uladzislau Rezki wrote: > On Thu, Mar 21, 2019 at 03:01:06PM -0700, Andrew Morton wrote: > > On Thu, 21 Mar 2019 20:03:26 +0100 "Uladzislau Rezki (Sony)" <urezki@xxxxxxxxx> wrote: > > > > > Hello. > > > > > > This is the v2 of the https://lkml.org/lkml/2018/10/19/786 rework. Instead of > > > referring you to that link, i will go through it again describing the improved > > > allocation method and provide changes between v1 and v2 in the end. > > > > > > ... > > > > > > > > Performance analysis > > > -------------------- > > > > Impressive numbers. But this is presumably a worst-case microbenchmark. > > > > Are you able to describe the benefits which are observed in some > > real-world workload which someone cares about? > > > We work with Android. Google uses its own tool called UiBench to measure > performance of UI. It counts dropped or delayed frames, or as they call it, > jank. Basically if we deliver 59(should be 60) frames per second then we > get 1 junk/drop. Agreed. Strictly speaking, "1 Jank" is not necessarily "1 frame drop". A delayed frame is also a Jank. Just because a frame is delayed does not mean it is dropped, there is double buffering etc to absorb delays. > I see that on our devices avg-jank is lower. In our case Android graphics > pipeline uses vmalloc allocations which can lead to delays of UI content > to GPU. But such behavior depends on your platform, parts of the system > which make use of it and if they are critical to time or not. > > Second example is indirect impact. During analysis of audio glitches > in high-resolution audio the source of drops were long alloc_vmap_area() > allocations. > > # Explanation is here > ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt > > # Audio 10 seconds sample is here. > # The drop occurs at 00:09.295 you can hear it > ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav Nice. > > It's a lot of new code. I t looks decent and I'll toss it in there for > > further testing. Hopefully someone will be able to find the time for a > > detailed review. > > > Thank you :) I can try to do a review fwiw. But I am severely buried right now. I did look at vmalloc code before for similar reasons (preempt off related delays causing jank / glitches etc). Any case, I'll take another look soon (in next 1-2 weeks). > > Trivial point: the code uses "inline" a lot. Nowadays gcc cheerfully > > ignores that and does its own thing. You might want to look at the > > effects of simply deleting all that. Is the generated code better or > > worse or the same? If something really needs to be inlined then use > > __always_inline, preferably with a comment explaining why it is there. > > > When the main core functionalities are "inlined" i see the benefit. > At least, it is noticeable by the "test driver". But i agree that > i should check one more time to see what can be excluded and used > as a regular call. Thanks for the hint, it is worth to go with > __always_inline instead. I wonder how clang behaves as far as inline hints go. That is how Android images build their kernels. thanks, - Joel