On Tue, May 9, 2023 at 12:24 PM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Tue, May 09, 2023 at 11:20:02AM -0700, Minchan Kim wrote: > > Hi Folks, > > > > On Tue, May 09, 2023 at 01:44:01PM -0400, Johannes Weiner wrote: > > > On Tue, May 09, 2023 at 12:00:30PM +0900, Sergey Senozhatsky wrote: > > > > On (23/05/08 09:00), Nhat Pham wrote: > > > > > > The deeper bug here is that zs_map_object() tries to add the page to > > > > > > the LRU list while the shrinker has it isolated for reclaim. This is > > > > > > way too sutble and error prone. Even if it worked now, it'll cause > > > > > > corruption issues down the line. > > > > > > > > > > > > For example, Nhat is adding a secondary entry point to reclaim. > > > > > > Reclaim expects that a page that's on the LRU is also on the fullness > > > > > > list, so this would lead to a double remove_zspage() and BUG_ON(). > > > > > > > > > > > > This patch doesn't just fix the crash, it eliminates the deeper LRU > > > > > > isolation issue and makes the code more robust and simple. > > > > > > > > > > I agree. IMO, less unnecessary concurrent interaction is always a > > > > > win for developers' and maintainers' cognitive load. > > > > > > > > Thanks for all the explanations. > > > > > > > > > As a side benefit - this also gets rid of the inelegant check > > > > > (mm == ZS_MM_WO). The fact that we had to include a > > > > > a multi-paragraph explanation for a 3-line piece of code > > > > > should have been a red flag. > > > > > > > > Minchan had some strong opinion on that, so we need to hear from him > > > > before we decide how do we fix it. > > > > > > I'd be happy if he could validate the fix. But this fixes a crash, so > > > the clock is ticking. > > > > > > I will also say, his was a design preference. One we agreed to only > > > very reluctantly: https://lore.kernel.org/lkml/Y3f6habiVuV9LMcu@xxxxxxxxxx/ > > > > > > Now we have a crash that is a direct result of it, and which cost us > > > (and apparently is still costing us) time and energy to resolve. > > > > > > Unless somebody surfaces a real technical problem with the fix, I'd > > > say let's do it our way this time. > > > > > > > Sorry for being too late to review. The reason I insisted on it was > > I overlookeded the bug and thought it was trivial change but better > > semantic since zsmalloc provides separate API between allocation and > > access unlike other allocators. Now, Nhat and Johannes provided it's > > more error prone, I am totally fine with this fix and will live it > > until the LRU writeback will move out of allocator. > > > > Sorry for wasting your time to hunt this bug down and thank you for fix! > > > > Acked-by: Minchan Kim <minchan@xxxxxxxxxx> > > Thanks Minchan! > > Domenico is working on the LRU refactor right now, and should have > patches for review soon. This will indeed get rid of all the zsmalloc > warts and make our lives much easier going forward! That's the dream! I look forward to Domenico's patches. And thanks for the ack, Minchan!