On Thu 07-08-14 12:25:07, Johannes Weiner wrote: > On Thu, Aug 07, 2014 at 09:38:26AM +0200, Michal Hocko wrote: > > On Wed 06-08-14 14:02:35, Andrew Morton wrote: > > > On Wed, 6 Aug 2014 14:00:55 -0700 Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > From: Johannes Weiner <hannes@xxxxxxxxxxx> > > > > Subject: mm: memcontrol: rewrite uncharge API > > > > > > > > > > Nope, sorry, that was missing > > > mm-memcontrol-rewrite-uncharge-api-fix-clear-page-mapping-in-migration.patch. > > > > > > This time: > > > > > > From: Johannes Weiner <hannes@xxxxxxxxxxx> > > > Subject: mm: memcontrol: rewrite uncharge API > > > > > > The memcg uncharging code that is involved towards the end of a page's > > > lifetime - truncation, reclaim, swapout, migration - is impressively > > > complicated and fragile. > > > > > > Because anonymous and file pages were always charged before they had their > > > page->mapping established, uncharges had to happen when the page type > > > could still be known from the context; as in unmap for anonymous, page > > > cache removal for file and shmem pages, and swap cache truncation for swap > > > pages. However, these operations happen well before the page is actually > > > freed, and so a lot of synchronization is necessary: > > > > > > - Charging, uncharging, page migration, and charge migration all need > > > to take a per-page bit spinlock as they could race with uncharging. > > > > > > - Swap cache truncation happens during both swap-in and swap-out, and > > > possibly repeatedly before the page is actually freed. This means > > > that the memcg swapout code is called from many contexts that make > > > no sense and it has to figure out the direction from page state to > > > make sure memory and memory+swap are always correctly charged. > > > > > > - On page migration, the old page might be unmapped but then reused, > > > so memcg code has to prevent untimely uncharging in that case. > > > Because this code - which should be a simple charge transfer - is so > > > special-cased, it is not reusable for replace_page_cache(). > > > > > > But now that charged pages always have a page->mapping, introduce > > > mem_cgroup_uncharge(), which is called after the final put_page(), when we > > > know for sure that nobody is looking at the page anymore. > > > > > > For page migration, introduce mem_cgroup_migrate(), which is called after > > > the migration is successful and the new page is fully rmapped. Because > > > the old page is no longer uncharged after migration, prevent double > > > charges by decoupling the page's memcg association (PCG_USED and > > > pc->mem_cgroup) from the page holding an actual charge. The new bits > > > PCG_MEM and PCG_MEMSW represent the respective charges and are transferred > > > to the new page during migration. > > > > > > mem_cgroup_migrate() is suitable for replace_page_cache() as well, which > > > gets rid of mem_cgroup_replace_page_cache(). > > > > > > Swap accounting is massively simplified: because the page is no longer > > > uncharged as early as swap cache deletion, a new mem_cgroup_swapout() can > > > transfer the page's memory+swap charge (PCG_MEMSW) to the swap entry > > > before the final put_page() in page reclaim. > > > > > > Finally, page_cgroup changes are now protected by whatever protection the > > > page itself offers: anonymous pages are charged under the page table lock, > > > whereas page cache insertions, swapin, and migration hold the page lock. > > > Uncharging happens under full exclusion with no outstanding references. > > > Charging and uncharging also ensure that the page is off-LRU, which > > > serializes against charge migration. Remove the very costly page_cgroup > > > lock and set pc->flags non-atomically. > > > > I see some point in squashing all the fixups into the single patch but I > > am afraid we have lost some interesting details from fix ups this time. > > I think that at least > > mm-memcontrol-rewrite-uncharge-api-fix-page-cache-migration.patch and > > mm-memcontrol-rewrite-uncharge-api-fix-page-cache-migration-2.patch > > would be good to go on their own _or_ their changelogs added here. The > > whole page cache replace path is obscure and we should rather have that > > documented so we do not have to google for details or go through painful > > code inspection next time. > > I agree, we would lose something there. There is a paragraph in the > changelog that says: > > mem_cgroup_migrate() is suitable for replace_page_cache() as well, > which gets rid of mem_cgroup_replace_page_cache(). > > Could you please update it to say: > > mem_cgroup_migrate() is suitable for replace_page_cache() as well, > which gets rid of mem_cgroup_replace_page_cache(). However, care > needs to be taken because both the source and the target page can > already be charged and on the LRU when fuse is splicing: grab the page > lock on the charge moving side to prevent changing pc->mem_cgroup of a > page under migration. Also, the lruvecs of both pages change as we > uncharge the old and charge the new during migration, and putback may > race with us, so grab the lru lock and isolate the pages iff on LRU to > prevent races and ensure the pages are on the right lruvec afterward. Thanks! This is much better. > > > [vdavydov@xxxxxxxxxxxxx: fix flags definition] > > > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> > > > Cc: Hugh Dickins <hughd@xxxxxxxxxx> > > > Cc: Tejun Heo <tj@xxxxxxxxxx> > > > Cc: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx> > > > Tested-by: Jet Chen <jet.chen@xxxxxxxxx> > > > Signed-off-by: Michal Hocko <mhocko@xxxxxxx> > > > > If this comes from the above change then is should be probably removed. > > You can replace it by my Acked-by. I have acked all the follow up fixes > > but forgot to ack the initial patch. > > Thanks! > > > > Tested-by: Felipe Balbi <balbi@xxxxxx> > > > > this tested-by came from the same preempt_{en,dis}able patch AFAICS. > > Yeah it might be a bit overreaching to apply this to the full change. > > On a different note, Michal, I just scrolled through the 2000 lines > that follow to see if you had any more comments, but there was only > your signature at the bottom. Please think about the quote context > after you inserted your inline comments and then trim accordingly. Sure I usually trim emails a lot. Forgot this time, sorry about that! -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>