On Tue, 21 Sep 2021, Mel Gorman wrote: > On Tue, Sep 21, 2021 at 10:13:17AM +1000, NeilBrown wrote: > > On Mon, 20 Sep 2021, Mel Gorman wrote: > > > -long wait_iff_congested(int sync, long timeout) > > > -{ > > > - long ret; > > > - unsigned long start = jiffies; > > > - DEFINE_WAIT(wait); > > > - wait_queue_head_t *wqh = &congestion_wqh[sync]; > > > - > > > - /* > > > - * If there is no congestion, yield if necessary instead > > > - * of sleeping on the congestion queue > > > - */ > > > - if (atomic_read(&nr_wb_congested[sync]) == 0) { > > > - cond_resched(); > > > - > > > - /* In case we scheduled, work out time remaining */ > > > - ret = timeout - (jiffies - start); > > > - if (ret < 0) > > > - ret = 0; > > > - > > > - goto out; > > > - } > > > - > > > - /* Sleep until uncongested or a write happens */ > > > - prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); > > > > Uninterruptible wait. > > > > .... > > > +static void > > > +reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason, > > > + long timeout) > > > +{ > > > + wait_queue_head_t *wqh = &pgdat->reclaim_wait; > > > + unsigned long start = jiffies; > > > + long ret; > > > + DEFINE_WAIT(wait); > > > + > > > + atomic_inc(&pgdat->nr_reclaim_throttled); > > > + WRITE_ONCE(pgdat->nr_reclaim_start, > > > + node_page_state(pgdat, NR_THROTTLED_WRITTEN)); > > > + > > > + prepare_to_wait(wqh, &wait, TASK_INTERRUPTIBLE); > > > > Interruptible wait. > > > > Why the change? I think these waits really need to be TASK_UNINTERRUPTIBLE. > > > > Because from mm/ context, I saw no reason why the task *should* be > uninterruptible. It's waiting on other tasks to complete IO and it is not > protecting device state, filesystem state or anything else. If it gets > a signal, it's safe to wake up, particularly if that signal is KILL and > the context is a direct reclaimer. I disagree. An Interruptible sleep only makes sense if the "was interrupted" status can propagate up to user-space (or to some in-kernel handler that will clear the signal). In particular, if reclaim_throttle() is called in a loop (which it is), and if that loop doesn't check for signal_pending (which it doesn't), then the next time around the loop after receiving a signal, it won't sleep at all. That would be bad. In general, if you don't return an error, then you probably shouldn't sleep Interruptible. I notice that tasks sleep on kswapd_wait as TASK_INTERRUPTIBLE, but they don't have any signal handling. I suspect this isn't actually a defect because I suspect that is it not even possible to SIGKILL kswapd. But the code seems misleading. I guess I should write a patch. Unless reclaim knows to abort completely on a signal (__GFP_KILLABLE ???) this must be an UNINTERRUPTIBLE wait. Thanks, NeilBrown > > The original TASK_UNINTERRUPTIBLE is almost certainly a copy&paste from > congestion_wait which may be called because a filesystem operation must > complete before it can return to userspace so a signal waking it up is > pointless. > > -- > Mel Gorman > SUSE Labs > >