Re: [PATCH] mm: sort freelist by rank number

David Hildenbrand <david@xxxxxxxxxx> · Mon, 10 Aug 2020 09:32:18 +0200

On 07.08.20 09:08, Pekka Enberg wrote:
> Hi Cho and David,
> 
> On Mon, Aug 3, 2020 at 10:57 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
>>
>> On 03.08.20 08:10, pullip.cho@xxxxxxxxxxx wrote:
>>> From: Cho KyongHo <pullip.cho@xxxxxxxxxxx>
>>>
>>> LPDDR5 introduces rank switch delay. If three successive DRAM accesses
>>> happens and the first and the second ones access one rank and the last
>>> access happens on the other rank, the latency of the last access will
>>> be longer than the second one.
>>> To address this panelty, we can sort the freelist so that a specific
>>> rank is allocated prior to another rank. We expect the page allocator
>>> can allocate the pages from the same rank successively with this
>>> change. It will hopefully improves the proportion of the consecutive
>>> memory accesses to the same rank.
>>
>> This certainly needs performance numbers to justify ... and I am sorry,
>> "hopefully improves" is not a valid justification :)
>>
>> I can imagine that this works well initially, when there hasn't been a
>> lot of memory fragmentation going on. But quickly after your system is
>> under stress, I doubt this will be very useful. Proof me wrong. ;)
>>
>> ... I dislike this manual setting of "dram_rank_granule". Yet another mm
>> feature that can only be enabled by a magic command line parameter where
>> users have to guess the right values.
>>
>> (side note, there have been similar research approaches to improve
>> energy consumption by switching off ranks when not needed).
> 
> I was thinking of the exact same thing. PALLOC [1] comes to mind, but
> perhaps there are more recent ones?

A more recent one is "Footprint-Based DIMM Hotplug"
(https://dl.acm.org/doi/abs/10.1109/TC.2019.2945562), which triggers
memory onlinng/offlining from the kernel to disable banks where possible
(I don't think the approach is upstream material in that form).

Also, I stumbled over "Towards Practical Page Placement for a Green
Memory Manager" (https://ieeexplore.ieee.org/document/7397629),
proposing an adaptive buddy allocator that tries to keep complete banks
free in the buddy where possible. That approach sounded quite
interesting while skimming over the paper.
> 
> I also dislike the manual knob, but is there a way for the OS to
> detect this by itself? My (perhaps outdated) understanding was that
> the DRAM address mapping scheme, for example, is not exposed to the
> OS.

I guess one universal approach is by measuring access times ... not what
we might be looking for :)

> 
> I think having more knowledge of DRAM controller details in the OS
> would be potentially beneficial for better page allocation policy, so
> maybe try come up with something more generic, even if the fallback to
> providing this information is a kernel command line option.
> 
> [1] http://cs-people.bu.edu/rmancuso/files/papers/palloc-rtas2014.pdf
> 
> - Pekka
> 

-- 
Thanks,

David / dhildenb