On Tue, May 09, 2023 at 11:20:02AM -0700, Minchan Kim wrote: > Hi Folks, > > On Tue, May 09, 2023 at 01:44:01PM -0400, Johannes Weiner wrote: > > On Tue, May 09, 2023 at 12:00:30PM +0900, Sergey Senozhatsky wrote: > > > On (23/05/08 09:00), Nhat Pham wrote: > > > > > The deeper bug here is that zs_map_object() tries to add the page to > > > > > the LRU list while the shrinker has it isolated for reclaim. This is > > > > > way too sutble and error prone. Even if it worked now, it'll cause > > > > > corruption issues down the line. > > > > > > > > > > For example, Nhat is adding a secondary entry point to reclaim. > > > > > Reclaim expects that a page that's on the LRU is also on the fullness > > > > > list, so this would lead to a double remove_zspage() and BUG_ON(). > > > > > > > > > > This patch doesn't just fix the crash, it eliminates the deeper LRU > > > > > isolation issue and makes the code more robust and simple. > > > > > > > > I agree. IMO, less unnecessary concurrent interaction is always a > > > > win for developers' and maintainers' cognitive load. > > > > > > Thanks for all the explanations. > > > > > > > As a side benefit - this also gets rid of the inelegant check > > > > (mm == ZS_MM_WO). The fact that we had to include a > > > > a multi-paragraph explanation for a 3-line piece of code > > > > should have been a red flag. > > > > > > Minchan had some strong opinion on that, so we need to hear from him > > > before we decide how do we fix it. > > > > I'd be happy if he could validate the fix. But this fixes a crash, so > > the clock is ticking. > > > > I will also say, his was a design preference. One we agreed to only > > very reluctantly: https://lore.kernel.org/lkml/Y3f6habiVuV9LMcu@xxxxxxxxxx/ > > > > Now we have a crash that is a direct result of it, and which cost us > > (and apparently is still costing us) time and energy to resolve. > > > > Unless somebody surfaces a real technical problem with the fix, I'd > > say let's do it our way this time. > > > > Sorry for being too late to review. The reason I insisted on it was > I overlookeded the bug and thought it was trivial change but better > semantic since zsmalloc provides separate API between allocation and > access unlike other allocators. Now, Nhat and Johannes provided it's > more error prone, I am totally fine with this fix and will live it > until the LRU writeback will move out of allocator. > > Sorry for wasting your time to hunt this bug down and thank you for fix! > > Acked-by: Minchan Kim <minchan@xxxxxxxxxx> Thanks Minchan! Domenico is working on the LRU refactor right now, and should have patches for review soon. This will indeed get rid of all the zsmalloc warts and make our lives much easier going forward!