On 2023/06/26 20:35, Michal Hocko wrote: > On Mon 26-06-23 20:26:02, Tetsuo Handa wrote: >> On 2023/06/26 19:48, Peter Zijlstra wrote: >>> On Mon, Jun 26, 2023 at 06:25:56PM +0900, Tetsuo Handa wrote: >>>> On 2023/06/26 17:12, Sebastian Andrzej Siewior wrote: >>>>> On 2023-06-24 15:54:12 [+0900], Tetsuo Handa wrote: >>>>>> Why not to do the same on the end side? >>>>>> >>>>>> static inline void do_write_seqcount_end(seqcount_t *s) >>>>>> { >>>>>> - seqcount_release(&s->dep_map, _RET_IP_); >>>>>> do_raw_write_seqcount_end(s); >>>>>> + seqcount_release(&s->dep_map, _RET_IP_); >>>>>> } >>>>> >>>>> I don't have a compelling argument for doing it. It is probably better >>>>> to release the lock from lockdep's point of view and then really release >>>>> it (so it can't be acquired before it is released). >>>> >>>> We must do it because this is a source of possible printk() deadlock. >>>> Otherwise, I will nack on PATCH 2/2. >>> >>> Don't be like that... just hate on prink like the rest of us. In fact, >>> i've been patching out the actual printk code for years because its >>> unusable garbage. >>> >>> Will this actually still be a problem once all the fancy printk stuff >>> lands? That shouldn't do synchronous prints except to 'atomic' consoles >>> by default IIRC. >> >> Commit 1007843a9190 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq >> seqlock") was applied to 4.14-stable trees, and CONFIG_PREEMPT_RT is available >> since 5.3. Thus, we want a fix which can be applied to 5.4-stable and later. >> This means that we can't count on all the fancy printk stuff being available. > > Is there any reason to backport RT specific fixup to stable trees? I > mean seriously, is there any actual memory hotplug user using > PREEMPT_RT? I would be more than curious to hear the usecase. Even if we don't backport RT specific fixup to stable trees, [PATCH 2/2] requires that [PATCH 1/2] guarantees that synchronous printk() never happens (for whatever reasons) between write_seqlock_irqsave(&zonelist_update_seq, flags) and write_sequnlock_irqrestore(&zonelist_update_seq, flags). If [PATCH 1/2] cannot guarantee it, [PATCH 2/2] will be automatically rejected. If [PATCH 2/2] cannot be applied, we have several alternatives. Alternative 1: Revert both commit 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation") and commit 1007843a9190 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock"). I don't think this will happen, for nobody will be happy. Alternative 2: Revert commit 1007843a9190 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock") and apply "mm/page_alloc: don't check zonelist_update_seq from atomic allocations" at https://lkml.kernel.org/r/dfdb9da6-ca8f-7a81-bfdd-d74b4c401f11@xxxxxxxxxxxxxxxxxxx . I think this is reasonable, for this reduces locking dependency. But Michal Hocko did not like it. Alternative 3: Somehow preserve printk_deferred_enter() => write_seqlock(&zonelist_update_seq) and write_sequnlock(&zonelist_update_seq) => printk_deferred_exit() pattern. Something like below? ---------------------------------------- diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 47421bedc12b..ded3ac3856e7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5805,6 +5805,7 @@ static void __build_all_zonelists(void *data) int nid; int __maybe_unused cpu; pg_data_t *self = data; +#ifndef CONFIG_PREEMPT_RT unsigned long flags; /* @@ -5813,6 +5814,9 @@ static void __build_all_zonelists(void *data) * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock. */ local_irq_save(flags); +#else + migrate_disable(); +#endif /* * Explicitly disable this CPU's synchronous printk() before taking * seqlock to prevent any printk() from trying to hold port->lock, for @@ -5859,7 +5863,11 @@ static void __build_all_zonelists(void *data) write_sequnlock(&zonelist_update_seq); printk_deferred_exit(); +#ifndef CONFIG_PREEMPT_RT local_irq_restore(flags); +#else + migrate_enable(); +#endif } static noinline void __init ----------------------------------------