On Fri, 23 Feb 2024, Zhitao Li wrote: > Thanks for Jeff's reply. > > I did see ERESTARTSYS in userland. As described in the above > "Reproduction" chapter, "dd" fails with "dd: error writing > '/mnt/test1/1G': Unknown error 512". > > After strace "dd", it turns out that syscall WRITE fails with: > write(1, "4\303\31\211\316\237\333\r-\275g\370\233\374X\277\374Tb\202\24\365\220\320\16\27o3\331q\344\364"..., > 1048576) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) > > In fact, other syscalls related to file systems can also fail with > ERESTARTSYS in our cases, for example: mount, open, read, write and so > on. > > Maybe the reason is that on forced unmount, rpc_killall_tasks() in > net/sunrpc/clnt.c will set all inflight IO with ERESTARTSYS, while no > signal gets involved. So ERESTARTSYS is not handled before entering > userspace. > > Best regards, > Zhitao Li at SmartX. > > On Thu, Feb 22, 2024 at 7:05 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > On Wed, 2024-02-21 at 13:48 +0000, Trond Myklebust wrote: > > > On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote: > > > > [You don't often get email from zhitao.li@xxxxxxxxxx. Learn why this > > > > is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > > > > > Hi, everyone, > > > > > > > > - Facts: > > > > I have a remote NFS export and I mount the same export on two > > > > different directories in my OS with the same options. There is an > > > > inflight IO under one mounted directory. And then I unmount another > > > > mounted directory with force. The inflight IO ends up with "Unknown > > > > error 512", which is ERESTARTSYS. > > > > > > > > > > All of the above is well known. That's because forced umount affects > > > the entire filesystem. Why are you using it here in the first place? It > > > is not intended for casual use. > > > > > > > While I agree Trond's above statement, the kernel is not supposed to > > leak error codes that high into userland. Are you seeing ERESTARTSYS > > being returned to system calls? If so, which ones? > > -- > > Jeff Layton <jlayton@xxxxxxxxxx> > I think this bug was introduced by Commit ae67bd3821bb ("SUNRPC: Fix up task signalling") in Linux v5.2. Prior to that commit, rpc_killall_tasks set the error to -EIO. After that commit it calls rpc_signal_task which always uses -ERESTARTSYS. This might be an appropriate fix. Can you please test and confirm? diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index 2d61987b3545..ed3a116efd5d 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -222,7 +222,7 @@ void rpc_put_task(struct rpc_task *); void rpc_put_task_async(struct rpc_task *); bool rpc_task_set_rpc_status(struct rpc_task *task, int rpc_status); void rpc_task_try_cancel(struct rpc_task *task, int error); -void rpc_signal_task(struct rpc_task *); +void rpc_signal_task(struct rpc_task *, int); void rpc_exit_task(struct rpc_task *); void rpc_exit(struct rpc_task *, int); void rpc_release_calldata(const struct rpc_call_ops *, void *); diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index cda0935a68c9..cdbdfae13030 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -895,7 +895,7 @@ void rpc_killall_tasks(struct rpc_clnt *clnt) trace_rpc_clnt_killall(clnt); spin_lock(&clnt->cl_lock); list_for_each_entry(rovr, &clnt->cl_tasks, tk_task) - rpc_signal_task(rovr); + rpc_signal_task(rovr, -EIO); spin_unlock(&clnt->cl_lock); } EXPORT_SYMBOL_GPL(rpc_killall_tasks); diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 6debf4fd42d4..e88621881036 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -852,14 +852,14 @@ void rpc_exit_task(struct rpc_task *task) } } -void rpc_signal_task(struct rpc_task *task) +void rpc_signal_task(struct rpc_task *task, int sig) { struct rpc_wait_queue *queue; if (!RPC_IS_ACTIVATED(task)) return; - if (!rpc_task_set_rpc_status(task, -ERESTARTSYS)) + if (!rpc_task_set_rpc_status(task, sig)) return; trace_rpc_task_signalled(task, task->tk_action); set_bit(RPC_TASK_SIGNALLED, &task->tk_runstate); @@ -992,7 +992,7 @@ static void __rpc_execute(struct rpc_task *task) * clean up after sleeping on some queue, we don't * break the loop here, but go around once more. */ - rpc_signal_task(task); + rpc_signal_task(task, -ERESTARTSYS); } trace_rpc_task_sync_wake(task, task->tk_action); }