2020년 6월 3일 (수) 오전 1:48, Johannes Weiner <hannes@xxxxxxxxxxx>님이 작성: > > On Tue, Jun 02, 2020 at 11:34:17AM +0900, Joonsoo Kim wrote: > > 2020년 6월 2일 (화) 오전 12:56, Johannes Weiner <hannes@xxxxxxxxxxx>님이 작성: > > > On Mon, Jun 01, 2020 at 03:14:24PM +0900, Joonsoo Kim wrote: > > > > But, I still think that modified refault activation equation isn't > > > > safe. The next > > > > problem I found is related to the scan ratio limit patch ("limit the range of > > > > LRU type balancing") on this series. See the below example. > > > > > > > > anon: Hot (X M) > > > > file: Hot (200 M) / dummy (200 M) > > > > P: 1200 M (3 parts, each one 400 M, P1, P2, P3) > > > > Access Pattern: A -> F(H) -> P1 -> A -> F(H) -> P2 -> ... -> > > > > > > > > Without this patch, A and F(H) are kept on the memory and look like > > > > it's correct. > > > > > > > > With this patch and below fix, refault equation for Pn would be: > > > > > > > > Refault dist of Pn = 1200 (from file non-resident) + 1200 * anon scan > > > > ratio (from anon non-resident) > > > > anon + active file = X + 200 > > > > 1200 + 1200 * anon scan ratio (0.5 ~ 2) < X + 200 > > > > > > That doesn't look quite right to me. The anon part of the refault > > > distance is driven by X, so the left-hand of this formula contains X > > > as well. > > > > > > 1000 file (1200M reuse distance, 200M in-core size) + F(H) reactivations + X * scan ratio < X + 1000 > > > > As I said before, there is no X on left-hand of this formula. To > > access all Pn and > > re-access P1, we need 1200M file list scan and reclaim. More scan isn't needed. > > With your patch "limit the range of LRU type balancing", scan ratio > > between file/anon > > list is limited to 0.5 ~ 2.0, so, maximum anon scan would be 1200 M * > > 2.0, that is, > > 2400 M and not bounded by X. That means that file list cannot be > > stable with some X. > > Oh, no X on the left because you're talking about the number of pages > scanned until the first refaults, which is fixed - so why are we still > interpreting the refault distance against a variable anon size X? Looks like I was confused again. Your formula is correct and mine is wrong. My mistake is I thought that your patch "limit the range of LRU type balancing" which makes scan *ratio* 2:1 leads to actual scan *count* ratio between anon/file to 2:1. But, now I realized that 2:1 is just scan ratio and actual scan *count* ratio could be far larger with certain list size. It would be X * scan ratio in above example so my explanation is wrong and you are right. Sorry for making a trouble. > Well, that's misleading. We compare against anon because part of the > cache is already encoded in the refault distance. What we're really > checking is access distance against total amount of available RAM. > > Consider this. We want to activate pages where > > access_distance <= RAM > > and our measure of access distance is: > > access_distance = refault_distance + inactive_file > > So the comparison becomes: > > refault_distance + inactive_file < RAM > > which we simplify to: > > refault_distance < active_file + anon > > There is a certain threshold for X simply because there is a certain > threshold for RAM beyond which we need to start activating. X cannot > be arbitrary, it must be X + cache filling up memory - after all we > have page reclaim evicting pages. > > Again, this isn't new. In the current code, we activate when: > > refault_distance < active_file > > which is > > access_distance <= RAM - anon > > You can see, whether things are stable or not always depends on the > existing workingset size. It's just a proxy for how much total RAM we > have potentially available to the refaulting page. > > > If my lastly found example is a correct example (your confirm is required), > > it is also related to the correctness issue since cold pages causes > > eviction of the hot pages repeatedly. > > I think your example is correct, but it benefits from the VM > arbitrarily making an assumption that has a 50/50 shot of being true. > > You and I know which pages are hot and which are cold because you > designed the example. > > All the VM sees is this: > > - We have an established workingset that has previously shown an > access distance <= RAM and therefor was activated. > > - We now have another set that also appears to have an access distance > <= RAM. The only way to know for sure, however, is sample the > established workingset and compare the relative access frequencies. > > Currently, we just assume the incoming pages are colder. Clearly > that's beneficial when it's true. Clearly that can be totally wrong. > > We must allow a fair comparison between these two sets. > > For cache, that's already the case - that's why I brought up the > cache-only example: if refault distances are 50M and you have 60M of > active cache, we activate all refaults and force an even competition > between the established workingset and the new pages. > > Whether we can protect active file when anon needs to shrink first and > can't (the activate/iocost split) that's a different question. But I'm > no longer so sure after looking into it further. > > First, we would need two different refault distances: either we > consider anon age and need to compare to active_file + anon, or we > don't and compare to active_file only. We cannot mix willy nilly, > because the metrics wouldn't be comparable. We don't have the space to > store two different eviction timestamps, nor could we afford to cut > the precision in half. > > Second, the additional page flag required to implement it. > > Third, it's somewhat moot because we still have the same behavior when > active_file would need to shrink and can't. There can't be a stable > state as long as refault distances <= active_file. > > > In this case, they (without patch, with patch) all have some correctness > > issue so we need to judge which one is better in terms of overall impact. > > I don't have strong opinion about it so it's up to you to decide the way to go. > > If my patch was simply changing the default assumption on which pages > are hot and which are cold, I would agree with you - the pros would be > equal to the cons, one way wouldn't be more correct than the other. > > But that isn't what my patch is doing. What it does is get rid of the > assumption, to actually sample and compare the access frequencies when > there isn't enough data to make an informed decision. > > That's a net improvement. Okay. Now I think that this patch has a net improvement. Acked-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> Thanks.