Re: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7 07/12] iov_iter: Convert iterate*() to inline funcs

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Wed, 28 Feb 2024 13:21:07 -0800

On Sat, 17 Feb 2024 at 19:13, Tong Tiangen <tongtiangen@xxxxxxxxxx> wrote:
>
> After this patch:
>    copy_page_from_iter_atomic()
>      -> iterate_and_advance2()
>        -> iterate_bvec()
>          -> remain = step()
>
> With CONFIG_ARCH_HAS_COPY_MC, the step() is copy_mc_to_kernel() which
> return "bytes not copied".
>
> When a memory error occurs during step(), the value of "left" equal to
> the value of "part" (no one byte is copied successfully). In this case,
> iterate_bvec() returns 0, and copy_page_from_iter_atomic() also returns
> 0. The callback shmem_write_end()[2] also returns 0. Finally,
> generic_perform_write() goes to "goto again"[3], and the loop restarts.
> 4][5] cannot enter and exit the loop, then deadloop occurs.

Hmm. If the copy doesn't succeed and make any progress at all, then
the code in generic_perform_write() after the "goto again"

                //[4]
                if (unlikely(fault_in_iov_iter_readable(i, bytes) ==
                              bytes)) {
                        status = -EFAULT;
                        break;
                }

should break out of the loop.

So either your analysis looks a bit flawed, or I'm missing something.
Likely I'm missing something really obvious.

Why does the copy_mc_to_kernel() fail, but the
fault_in_iov_iter_readable() succeeds?

              Linus