Re: zcache+zram working together?

Minchan Kim <minchan@xxxxxxxxxx> · Wed, 20 Feb 2013 14:53:24 +0900

On Wed, Feb 20, 2013 at 11:47:32AM +0900, Kyungmin Park wrote:
> On Wed, Feb 20, 2013 at 9:06 AM, Minchan Kim <minchan@xxxxxxxxxx> wrote:
> > On Sat, Feb 16, 2013 at 04:15:41PM +0800, Simon Jeons wrote:
> >> On 12/11/2012 02:42 PM, Minchan Kim wrote:
> >> >On Fri, Dec 07, 2012 at 01:31:35PM -0800, Dan Magenheimer wrote:
> >> >>Last summer, during the great(?) zcache-vs-zcache2 debate,
> >> >>I wondered if there might be some way to obtain the strengths
> >> >>of both.  While following Luigi's recent efforts toward
> >> >>using zram for ChromeOS "swap", I thought of an interesting
> >> >>interposition of zram and zcache that, at first blush, makes
> >> >>almost no sense at all, but after more thought, may serve as a
> >> >>foundation for moving towards a more optimal solution for use
> >> >>of "adaptive compression" in the kernel, at least for
> >> >>embedded systems.
> >> >>
> >> >>To quickly review:
> >> >>
> >> >>Zram (when used for swap) compresses only anonymous pages and
> >> >>only when they are swapped but uses the high-density zsmalloc
> >> >>allocator and eliminates the need for a true swap device, thus
> >> >>making zram a good fit for embedded systems.  But, because zram
> >> >>appears to the kernel as a swap device, zram data must traverse
> >> >>the block I/O subsystem and is somewhat difficult to monitor and
> >> >>control without significant changes to the swap and/or block
> >> >>I/O subsystem, which are designed to handle fixed block-sized
> >> >>data.
> >> >>
> >> >>Zcache (zcache2) compresses BOTH clean page cache pages that
> >> >>would otherwise be evicted, and anonymous pages that would
> >> >>otherwise be sent to a swap device.  Both paths use in-kernel
> >> >>hooks (cleancache and frontswap respectively) which avoid
> >> >>most or all of the block I/O subsystem and the swap subsystem.
> >> >>Because of this and since it is designed using transcendent
> >> >>memory ("tmem") principles, zcache has a great deal more
> >> >>flexibility in control and monitoring.  Zcache uses the simpler,
> >> >>more predictable "zbud" allocator which achieves lower density
> >> >>but provides greater flexibility under high pressure.
> >> >>But zcache requires a swap device as a "backup" so seems
> >> >>unsuitable for embedded systems.
> >> >>
> >> >>(Minchan, I know at one point you were working on some
> >> >>documentation to contrast zram and zcache so you may
> >> >>have something more to add here...)
> >> >>
> >> >>What if one were to enable both?  This is possible today with
> >> >>no kernel change at all by configuring both zram and zcache2
> >> >>into the kernel and then configuring zram at boottime.
> >> >>
> >> >>When memory pressure is dominated by file pages, zcache (via
> >> >>the cleancache hooks) provides compression to optimize memory
> >> >>utilization.  As more pressure is exerted by anonymous pages,
> >> >>"swapping" occurs but the frontswap hooks route the data to
> >> >>zcache which, as necessary, reclaims physical pages used by
> >> >>compressed file pages to use for compressed anonymous pages.
> >> >>At this point, any compressions unsuitable for zbud are rejected
> >> >>by zcache and passed through to the "backup" swap device...
> >> >>which is zram!  Under high pressure from anonymous pages,
> >> >>zcache can also be configured to "unuse" pages to zram (though
> >> >>this functionality is still not merged).
> >> >>
> >> >>I've plugged zcache and zram together and watched them
> >> >>work/cooperate, via their respective debugfs statistics.
> >> >>While I don't have benchmarking results and may not have
> >> >>time anytime soon to do much work on this, it seems like
> >> >>there is some potential here, so I thought I'd publish the
> >> >>idea so that others can give it a go and/or look at
> >> >>other ways (including kernel changes) to combine the two.
> >> >>
> >> >>Feedback welcome and (early) happy holidays!
> >> >Interesting, Dan!
> >> >I would like to get a chance to investigate it if I have a time
> >> >in future.
> >> >
> >> >Another synergy with BOTH is to remove CMA completely because
> >> >it makes mm core code complicated with hooking and still have a
> >> >problem with pinned page and eviction working set for getting
> >>
> >> Do you mean get_user_pages? Could you explain in details about the
> >> downside of CMA?
> >
> > Good question.
> >
> > 1. Ignore workingset.
> >    CMA can sweep out woring set pages in CMA area for getting contiguous
> >    memory.
> Theoritically agreed, but there's no data to prove this one.

CMA area is last fallback type for allocation and pages in that area would
be evicted out when we need contiguous memory. It means newly forked task's
pages would be likely in that area. newly task's pages would be fit into
working set category POV LRU. No?

> >
> > 2. No guarantee of contigous memory area
> >    As I metioned, get_user_pages could pin the page so ends up failing
> >    migration.
> Right it's working item now, we have to guarantee these pages can't
> allocate from CMA area.

Good to hear. CMA's goal is to guarantee it.
If it can't, there is no point to use it. FYI, memory-hotplug people have
same problem and have tried to solve it and I wanted they should solve
CMA problem by their solution, too but not sure they do.
I'm looking forwading to seeing your elegant works.

> 
> >
> > 3. Latency
> >    CMA reclaims all pages in CMA area when we need it. It means sometime
> >    we should write out dirty pages so it could make big overhead POV latency.
> >    Even, unmapping of all pages from pte of all processes isn't trivial.
> It's trade off between requirement and performance. If feature is more
> important and need more memory, it can accept it

It depends on usecase and as you already know, many people want to use
CMA with small latency if possible. If CMA can't meet their latency
requirement, they might use reserved memory rather than CMA or use CMA
with some harmful jobs(ex, sync + drop_cache).
If kernel can provide better solution, they can avoid such things.

> >
> > 4. Adding many hooks in MM code. - Personally, I really hate it.
> 
> But there are cases to use CMA. e.g., DRM playback.
> 
> We have to guarantee the physical contiguous memory for TrustZone
> solution at ARM.
> Without reseverd memory concept. there's no way to get physical
> congituous memory execpt CMA.

I don't get it.
I meant I don't like CONFIG_CMA hook under mm/.

> 
> Thank you,
> Kyungmin Park
> 
> >
> > --
> > Kind regards,
> > Minchan Kim
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>