On Tue, Apr 14, 2020 at 04:19:01PM +0800, Alex Shi wrote: > > > 在 2020/4/14 上午2:07, Johannes Weiner 写道: > > But isolation actually needs to lock out charging, or it would operate > > on the wrong list: > > > > isolation: commit_charge: > > if (TestClearPageLRU(page)) > > page->mem_cgroup = new > > // page is still physically on > > // the root_mem_cgroup's LRU. We're > > // updating the wrong list: > > memcg = page->mem_cgroup > > spin_lock(memcg->lru_lock) > > del_page_from_lru_list(page, memcg) > > spin_unlock(memcg->lru_lock) > > > > lrucare really is a mess. Even before this patch series, it makes > > things tricky and subtle and error prone. > > > > The only reason we're doing it is for when there is swapping without > > swap tracking, in which case swap reahadead needs to put pages on the > > LRU but cannot charge them until we have a faulting vma later. > > > > But it's not clear how practical such a configuration is. Both memory > > and swap are shared resources, and isolation isn't really effective > > when you restrict access to memory but then let workloads swap freely. > > > > Plus, the overhead of tracking is tiny - 512k per G of swap (0.04%). > > > > Maybe we should just delete MEMCG_SWAP and unconditionally track swap > > entry ownership when the memory controller is enabled. I don't see a > > good reason not to, and it would simplify the entire swapin path, the > > LRU locking, and the page->mem_cgroup stabilization rules. > > Hi Johannes, > > I think what you mean here is to keep swap_cgroup id even it was swaped, > then we read back the page from swap disk, we don't need to charge it. > So all other memcg charge are just happens on non lru list, thus we have > no isolation required in above awkward scenario. We don't need to change how swap recording works, we just need to always do it when CONFIG_MEMCG && CONFIG_SWAP. We can uncharge the page once it's swapped out. The only difference is that with a swap record, we know who owned the page and can charge readahead pages right awya, before setting PageLRU; whereas without a record, we read pages onto the LRU, and then wait until we hit a page fault with an mm to charge. That's why we have this lrucare mess.