On Fri, Oct 25, 2024 at 11:35:03PM +0100, Lorenzo Stoakes wrote: > On Fri, Oct 25, 2024 at 11:56:52PM +0200, Vlastimil Babka wrote: > > On 10/25/24 19:12, Lorenzo Stoakes wrote: > > > On Wed, Oct 23, 2024 at 05:24:40PM +0100, Lorenzo Stoakes wrote: > > >> Implement a new lightweight guard page feature, that is regions of userland > > >> virtual memory that, when accessed, cause a fatal signal to arise. > > > > > > <snip> > > > > > > Hi Andrew - Could you apply the below fix-patch? I realise we must handle > > > fatal signals and conditional rescheduling in the vector_madvise() special > > > case. > > > > > > Thanks! > > > > > > ----8<---- > > > From 546d7e1831c71599fc733d589e0d75f52e84826d Mon Sep 17 00:00:00 2001 > > > From: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> > > > Date: Fri, 25 Oct 2024 18:05:48 +0100 > > > Subject: [PATCH] mm: yield on fatal signal/cond_sched() in vector_madvise() > > > > > > While we have to treat -ERESTARTNOINTR specially here as we are looping > > > through a vector of operations and can't simply restart the entire > > > operation, we mustn't hold up fatal signals or RT kernels. > > > > For plain madvise() syscall returning -ERESTARTNOINTR does the right thing > > and checks fatal_signal_pending() before returning, right? > > I believe so. But now you've caused me some doubt so let me double check > and make absolutely sure :) > > > > > Uh actually can we be just returning -ERESTARTNOINTR or do we need to use > > restart_syscall()? > > Yeah I was wondering about that, but restart_syscall() seems to set > TIF_SIGPENDING, and I wondered if that was correct... but then I saw other > places that seemed to use it direct so it seemed so. > > Let's eliminiate doubt, will check this next week and make sure. > Yeah looks like we do actually, as the function is handled by arch_do_signal_or_restart(): do_syscall_64() -> sycall_exit_to_user_mode_work() -> __sycall_exit_to_user_mode_work() -> exit_to_user_mode_prepare() -> exit_to_user_mode_loop() -> arch_do_signal_or_restart() -> (possibly) get_signal() And arch_do_signal_or_restart() is only invoked by exit_to_user_mode_loop() if _TIF_SIGPENDING or _TIF_NOTIFY_SIGNAL is set: if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) arch_do_signal_or_restart(regs); This is set by restart_syscall(): static inline int restart_syscall(void) { set_tsk_thread_flag(current, TIF_SIGPENDING); return -ERESTARTNOINTR; } It's a nop if no signal is actually pending too, and it handily also deals with signals... > > > > > --- > > > mm/madvise.c | 8 +++++++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > > index 48eba25e25fe..127aa5d86656 100644 > > > --- a/mm/madvise.c > > > +++ b/mm/madvise.c > > > @@ -1713,8 +1713,14 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter, > > > * we have already rescinded locks, it should be no problem to > > > * simply try again. > > > */ > > > - if (ret == -ERESTARTNOINTR) > > > + if (ret == -ERESTARTNOINTR) { > > > + if (fatal_signal_pending(current)) { > > > + ret = -EINTR; > > > + break; > > > + } > > > + cond_resched(); > > > > Should be unnecessary as we're calling an operation that takes a rwsem so > > there are reschedule points already. And with lazy preempt hopefully > > cond_resched()s will become history, so let's not add more only to delete later. > > Ack will remove on respin. > > > > > > continue; > > > + } > > > if (ret < 0) > > > break; > > > iov_iter_advance(iter, iter_iov_len(iter)); > > > -- > > > 2.47.0 > > > > For simplicitly with your other comments too I think I'll respin this next > week. So will respin to use restart_syscall() correctly (+ fix up your other points too). Have tested and confirmed that failing to use restart_syscall() causes the -ERESTARTINTR to be returned and no restart, but using it works.