Re: commit 0790303ec869 leads to cpu stall without CONFIG_FANOTIFY_ACCESS_PERMISSIONS=y

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello!

On Sun 08-12-24 16:25:19, Bert Karwatzki wrote:
> Since linux-next-20241206 booting my debian unstable system hangs before starting gdm.
> After some time these messages appear in /var/log/kern.log:

Thanks for report!

<snip stacktraces>

> I bisected this between linux-6.13-rc1 and linux-20241206 and found this as
> offending commit:
> 0790303ec869 ("fsnotify: generate pre-content permission event on page fault")
> 
> I also noticed that only a part of the commit causes the issue, and reverting
> that part solves it in linux-next-20241206:
> 
> commit 6207000b72058b45bb03f0975fbbbcd9dae06238
> Author: Bert Karwatzki <spasswolf@xxxxxx>
> Date:   Sun Dec 8 01:51:59 2024 +0100
> 
>     mm: filemap: partially revert commit 790303ec869
> 
>     Reverting this part of commit 790303ec869 is enough
>     to fix the issue.
> 
>     Signed-off-by: Bert Karwatzki <spasswolf@xxxxxx>
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 23e001f5cd0f..9bf2fc833f3c 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3419,37 +3419,6 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
>  	 * or because readahead was otherwise unable to retrieve it.
>  	 */
>  	if (unlikely(!folio_test_uptodate(folio))) {
> -		/*
> -		 * If this is a precontent file we have can now emit an event to
> -		 * try and populate the folio.
> -		 */
> -		if (!(vmf->flags & FAULT_FLAG_TRIED) &&
> -		    unlikely(FMODE_FSNOTIFY_HSM(file->f_mode))) {
> -			loff_t pos = folio_pos(folio);
> -			size_t count = folio_size(folio);
> -
> -			/* We're NOWAIT, we have to retry. */
> -			if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) {
> -				folio_unlock(folio);
> -				goto out_retry;
> -			}
> -
> -			if (mapping_locked)
> -				filemap_invalidate_unlock_shared(mapping);
> -			mapping_locked = false;
> -
> -			folio_unlock(folio);
> -			fpin = maybe_unlock_mmap_for_io(vmf, fpin);
> -			if (!fpin)
> -				goto out_retry;
> -
> -			error = fsnotify_file_area_perm(fpin, MAY_ACCESS, &pos,
> -							count);
> -			if (error)
> -				ret = VM_FAULT_SIGBUS;
> -			goto out_retry;
> -		}
> -
>  		/*
>  		 * If the invalidate lock is not held, the folio was in cache
>  		 * and uptodate and now it is not. Strange but possible since we
> 
> 
> Then I took a closer look at the function called in the problematic code
> and noticed that fsnotify_file_area_perm(), is a NOOP when
> CONFIG_FANOTIFY_ACCESS_PERMISSIONS is not set (which was the case in my
> .config). This also explains why this was not found before, as
> distributional .config file have this option enabled.  Setting the option
> to y solves the issue, too

Well, I agree with you on all the points but the real question is, how come
the test FMODE_FSNOTIFY_HSM(file->f_mode) was true on our kernel when you
clearly don't run HSM software, even more so with
CONFIG_FANOTIFY_ACCESS_PERMISSIONS disabled. That's the real cause of this
problem. Something fishy is going on here... checking...

Ah, because I've botched out file_set_fsnotify_mode() in case
CONFIG_FANOTIFY_ACCESS_PERMISSIONS is disabled. This should fix the
problem:

index 1a9ef8f6784d..778a88fcfddc 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -215,6 +215,7 @@ static inline int fsnotify_open_perm(struct file *file)
 #else
 static inline void file_set_fsnotify_mode(struct file *file)
 {
+       file->f_mode |= FMODE_NONOTIFY_PERM;
 }

I'm going to test this with CONFIG_FANOTIFY_ACCESS_PERMISSIONS disabled and
push out a fixed version. Thanks again for the report and analysis!

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux