Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2023-09-21 at 11:39 -0400, Jeff Layton wrote:
> On Thu, 2023-09-21 at 11:19 -0400, Mimi Zohar wrote:
> > On Thu, 2023-09-21 at 11:10 -0400, Jeff Layton wrote:
> > > On Thu, 2023-09-21 at 10:52 -0400, Mimi Zohar wrote:
> > > > On Thu, 2023-09-21 at 13:48 +0200, Christian Brauner wrote:
> > > > > On Thu, Sep 21, 2023 at 07:24:23AM -0400, Mimi Zohar wrote:
> > > > > > On Thu, 2023-09-21 at 06:32 -0400, Jeff Layton wrote:
> > > > > > > On Wed, 2023-09-20 at 17:52 -0700, Casey Schaufler wrote:
> > > > > > > > On 9/20/2023 5:10 PM, Stefan Berger wrote:
> > > > > > > > > 
> > > > > > > > > On 9/20/23 18:09, Stefan Berger wrote:
> > > > > > > > > > 
> > > > > > > > > > On 9/20/23 17:16, Jeff Layton wrote:
> > > > > > > > > > > On Wed, 2023-09-20 at 16:37 -0400, Stefan Berger wrote:
> > > > > > > > > > > > On 9/20/23 13:01, Stefan Berger wrote:
> > > > > > > > > > > > > On 9/17/23 20:04, syzbot wrote:
> > > > > > > > > > > > > > syzbot has bisected this issue to:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > commit db1d1e8b9867aae5c3e61ad7859abfcc4a6fd6c7
> > > > > > > > > > > > > > Author: Jeff Layton <jlayton@xxxxxxxxxx>
> > > > > > > > > > > > > > Date:   Mon Apr 17 16:55:51 2023 +0000
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >       IMA: use vfs_getattr_nosec to get the i_version
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > bisection log:
> > > > > > > > > > > > > > https://syzkaller.appspot.com/x/bisect.txt?x=106f7e54680000
> > > > > > > > > > > > > > start commit:   a747acc0b752 Merge tag
> > > > > > > > > > > > > > 'linux-kselftest-next-6.6-rc2'
> > > > > > > > > > > > > > of g..
> > > > > > > > > > > > > > git tree:       upstream
> > > > > > > > > > > > > > final oops:
> > > > > > > > > > > > > > https://syzkaller.appspot.com/x/report.txt?x=126f7e54680000
> > > > > > > > > > > > > > console output:
> > > > > > > > > > > > > > https://syzkaller.appspot.com/x/log.txt?x=146f7e54680000
> > > > > > > > > > > > > > kernel config:
> > > > > > > > > > > > > > https://syzkaller.appspot.com/x/.config?x=df91a3034fe3f122
> > > > > > > > > > > > > > dashboard link:
> > > > > > > > > > > > > > https://syzkaller.appspot.com/bug?extid=a67fc5321ffb4b311c98
> > > > > > > > > > > > > > syz repro:
> > > > > > > > > > > > > > https://syzkaller.appspot.com/x/repro.syz?x=1671b694680000
> > > > > > > > > > > > > > C reproducer:
> > > > > > > > > > > > > > https://syzkaller.appspot.com/x/repro.c?x=14ec94d8680000
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Reported-by: syzbot+a67fc5321ffb4b311c98@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > > > > > > > > > > > > Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the
> > > > > > > > > > > > > > i_version")
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > For information about bisection process see:
> > > > > > > > > > > > > > https://goo.gl/tpsmEJ#bisection
> > > > > > > > > > > > > The final oops shows this here:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000058
> > > > > > > > > > > > > #PF: supervisor read access in kernel mode
> > > > > > > > > > > > > #PF: error_code(0x0000) - not-present page
> > > > > > > > > > > > > PGD 0 P4D 0
> > > > > > > > > > > > > Oops: 0000 [#1] PREEMPT SMP
> > > > > > > > > > > > > CPU: 0 PID: 3192 Comm: syz-executor.0 Not tainted
> > > > > > > > > > > > > 6.4.0-rc2-syzkaller #0
> > > > > > > > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > > > > > > > > > > BIOS Google 08/04/2023
> > > > > > > > > > > > > RIP: 0010:__lock_acquire+0x35/0x490 kernel/locking/lockdep.c:4946
> > > > > > > > > > > > > Code: 83 ec 18 65 4c 8b 35 aa 60 f4 7e 83 3d b7 11 e4 02 00 0f 84 05
> > > > > > > > > > > > > 02 00 00 4c 89 cb 89 cd 41 89 d5 49 89 ff 83 fe 01 77 0c 89 f0
> > > > > > > > > > > > > <49> 8b
> > > > > > > > > > > > > 44 c7 08 48 85 c0 75 1b 4c 89 ff 31 d2 45 89 c4 e8 74 f6 ff
> > > > > > > > > > > > > RSP: 0018:ffffc90002edb840 EFLAGS: 00010097
> > > > > > > > > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
> > > > > > > > > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000050
> > > > > > > > > > > > > RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
> > > > > > > > > > > > > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > > > > > > > > > > > > R13: 0000000000000000 R14: ffff888102ea5340 R15: 0000000000000050
> > > > > > > > > > > > > FS:  0000000000000000(0000) GS:ffff88813bc00000(0000)
> > > > > > > > > > > > > knlGS:0000000000000000
> > > > > > > > > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > > > > > CR2: 0000000000000058 CR3: 0000000003aa8000 CR4: 00000000003506f0
> > > > > > > > > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > > > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > > > > > > > Call Trace:
> > > > > > > > > > > > >   <TASK>
> > > > > > > > > > > > >   lock_acquire+0xd8/0x1f0 kernel/locking/lockdep.c:5691
> > > > > > > > > > > > >   seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline]
> > > > > > > > > > > > >   get_fs_root_rcu fs/d_path.c:243 [inline]
> > > > > > > > > > > > >   d_path+0xd1/0x1f0 fs/d_path.c:285
> > > > > > > > > > > > >   audit_log_d_path+0x65/0x130 kernel/audit.c:2139
> > > > > > > > > > > > >   dump_common_audit_data security/lsm_audit.c:224 [inline]
> > > > > > > > > > > > >   common_lsm_audit+0x3b3/0x840 security/lsm_audit.c:458
> > > > > > > > > > > > >   smack_log+0xad/0x130 security/smack/smack_access.c:383
> > > > > > > > > > > > >   smk_tskacc+0xb1/0xd0 security/smack/smack_access.c:253
> > > > > > > > > > > > >   smack_inode_getattr+0x8a/0xb0 security/smack/smack_lsm.c:1187
> > > > > > > > > > > > >   security_inode_getattr+0x32/0x50 security/security.c:2114
> > > > > > > > > > > > >   vfs_getattr+0x1b/0x40 fs/stat.c:167
> > > > > > > > > > > > >   ovl_getattr+0xa6/0x3e0 fs/overlayfs/inode.c:173
> > > > > > > > > > > > >   ima_check_last_writer security/integrity/ima/ima_main.c:171
> > > > > > > > > > > > > [inline]
> > > > > > > > > > > > >   ima_file_free+0xbd/0x130 security/integrity/ima/ima_main.c:203
> > > > > > > > > > > > >   __fput+0xc7/0x220 fs/file_table.c:315
> > > > > > > > > > > > >   task_work_run+0x7d/0xa0 kernel/task_work.c:179
> > > > > > > > > > > > >   exit_task_work include/linux/task_work.h:38 [inline]
> > > > > > > > > > > > >   do_exit+0x2c7/0xa80 kernel/exit.c:871 <-----------------------
> > > > > > > > > > > > >   do_group_exit+0x85/0xa0 kernel/exit.c:1021
> > > > > > > > > > > > >   get_signal+0x73c/0x7f0 kernel/signal.c:2874
> > > > > > > > > > > > >   arch_do_signal_or_restart+0x89/0x290 arch/x86/kernel/signal.c:306
> > > > > > > > > > > > >   exit_to_user_mode_loop+0x61/0xb0 kernel/entry/common.c:168
> > > > > > > > > > > > >   exit_to_user_mode_prepare+0x64/0xb0 kernel/entry/common.c:204
> > > > > > > > > > > > >   __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
> > > > > > > > > > > > >   syscall_exit_to_user_mode+0x2b/0x1d0 kernel/entry/common.c:297
> > > > > > > > > > > > >   do_syscall_64+0x4d/0x90 arch/x86/entry/common.c:86
> > > > > > > > > > > > >   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > do_exit has called exit_fs(tsk) [
> > > > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/kernel/exit.c#L867 ]
> > > > > > > > > > > > > 
> > > > > > > > > > > > > exit_fs(tsk) has set tsk->fs = NULL [
> > > > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/fs_struct.c#L103
> > > > > > > > > > > > > ]
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I think this then bites in d_path() where it calls:
> > > > > > > > > > > > > 
> > > > > > > > > > > > >      get_fs_root_rcu(current->fs, &root);   [
> > > > > > > > > > > > > https://elixir.bootlin.com/linux/v6.4-rc2/source/fs/d_path.c#L285 ]
> > > > > > > > > > > > > 
> > > > > > > > > > > > > current->fs is likely NULL here.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > If this was correct it would have nothing to do with the actual
> > > > > > > > > > > > > patch,
> > > > > > > > > > > > > though, but rather with the fact that smack logs on process
> > > > > > > > > > > > > termination. I am not sure what the solution would be other than
> > > > > > > > > > > > > testing for current->fs == NULL in d_path before using it and
> > > > > > > > > > > > > returning an error that is not normally returned or trying to
> > > > > > > > > > > > > intercept this case in smack.
> > > > > > > > > > > > I have now been able to recreate the syzbot issue with the test
> > > > > > > > > > > > program
> > > > > > > > > > > > and the issue is exactly the one described here, current->fs == NULL.
> > > > > > > > > > > > 
> > > > > > > > > > > Earlier in this thread, Amir had a diagnosis that IMA is
> > > > > > > > > > > inappropriately
> > > > > > > > > > > trying to use f_path directly instead of using the helpers that are
> > > > > > > > > > > friendly for stacking filesystems.
> > > > > > > > > > > 
> > > > > > > > > > > https://lore.kernel.org/linux-fsdevel/CAOQ4uxgjnYyeQL-LbS5kQ7+C0d6sjzKqMDWAtZW8cAkPaed6=Q@xxxxxxxxxxxxxx/
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > I'm not an IMA hacker so I'm not planning to roll a fix here. Perhaps
> > > > > > > > > > > someone on the IMA team could try this approach?
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > I have applied this patch here from Amir now and it does NOT resolve
> > > > > > > > > > the issue:
> > > > > > > > > > 
> > > > > > > > > > https://lore.kernel.org/linux-integrity/296dae962a2a488bde682d3def074db91686e1c3.camel@xxxxxxxxxxxxx/T/#m4ebdb780bf6952e7f210c55e87950d0cfa1d5eb0
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > This seems to resolve the issue:
> > > > > > > > > 
> > > > > > > > > diff --git a/security/smack/smack_access.c
> > > > > > > > > b/security/smack/smack_access.c
> > > > > > > > > index 585e5e35710b..57afcea1e39b 100644
> > > > > > > > > --- a/security/smack/smack_access.c
> > > > > > > > > +++ b/security/smack/smack_access.c
> > > > > > > > > @@ -347,6 +347,9 @@ void smack_log(char *subject_label, char
> > > > > > > > > *object_label, int request,
> > > > > > > > >         struct smack_audit_data *sad;
> > > > > > > > >         struct common_audit_data *a = &ad->a;
> > > > > > > > > 
> > > > > > > > > +       if (current->flags & PF_EXITING)
> > > > > > > > > +               return;
> > > > > > > > > +
> > > > > > > > 
> > > > > > > > Based on what I see here I can understand that this prevents the panic,
> > > > > > > > but it isn't so clear what changed that introduced the problem.
> > > > > > > > 
> > > > > > > > >         /* check if we have to log the current event */
> > > > > > > > >         if (result < 0 && (log_policy & SMACK_AUDIT_DENIED) == 0)
> > > > > > > > >                 return;
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > 
> > > > > > > Apparently, it's this patch:
> > > > > > > 
> > > > > > >     db1d1e8b9867 IMA: use vfs_getattr_nosec to get the i_version
> > > > > > 
> > > > > > Yes, the syzbot was updated with that info.
> > > > > > 
> > > > > > > At one time, IMA would reach directly into the inode to get the
> > > > > > > i_version and ctime. That was fine for certain filesystems, but with
> > > > > > > more recent changes it needs to go through ->getattr instead. Evidently,
> > > > > > > it's selecting the wrong inode to query when dealing with overlayfs and
> > > > > > > that's causing panics at times.
> > > > > > > 
> > > > > > > As to why the above patch helps, I'm not sure, but given that it doesn't
> > > > > > > seem to change which inode is being queried via getattr, it seems like
> > > > > > > this is probably papering over the real bug. That said, IMA and
> > > > > > > overlayfs are not really in my wheelhouse, so I could be very wrong
> > > > > > > here.
> > > > > > 
> > > > > > The call to vfs_getattr_nosec() somehow triggers a call to
> > > > > > security_inode_getattr().  Without the call neither ovl_getattr() nor
> > > > > > smack_inode_getattr() would be called.
> > > > > 
> > > > > ima_file_free()
> > > > > -> ima_check_last_writer()
> > > > >    -> vfs_getattr_nosec()
> > > > >       -> i_op->getattr() == ovl_getattr()
> > > > >          -> vfs_getattr()
> > > > >             -> security_inode_getattr()
> > > > > 	    -> real_i_op->getattr()
> > > > > 
> > > > > is the callchain that triggers this.
> > > > 
> > > > Thank you for the explanation as to why ovl_getattr() and subsequently
> > > > smack_inode_getattr() is being called.
> > > > 
> > > > > 
> > > > > ima_file_free() is called in a very sensitive location: __fput() that
> > > > > can be called from task work when the process is already PF_EXITING.
> > > > > 
> > > > > The ideal solution would be for ima to stop calling back into the
> > > > > filesystems in this location at all but that's probably not going to
> > > > > happen because I now realize you also set extended attributes from
> > > > > __fput():
> > > > > 
> > > > > 
> > > > > ima_check_last_writer()
> > > > > -> ima_update_xatt()
> > > > >    -> ima_fix_xattr()
> > > > >       -> __vfs_setxattr_noperm()
> > > > > 
> > > > > The __vfs_setxattr_noperm() codepath can itself trigger
> > > > > security_inode_post_setxattr() and security_inode_setsecurity(). So
> > > > > those hooks are hopefully safe to be called with PF_EXITING tasks as
> > > > > well...
> > > > > 
> > > > > Imho, this is all very wild but I'm not judging.
> > > > 
> > > > Measuring and verifying immutable files is straight forward. 
> > > > Measuring, verifiying, and updating mutable file hashes is a lot more
> > > > complicated.  Re-calculating the file hash everytime the file changes
> > > > would impact performance.  The file hash is currently updated as the
> > > > last writer closes the file (__fput).  One of the reasons for the wq
> > > > was for IMA to safely calculate the file hash and and take the i_mutex
> > > > to write the xattr.
> > > > 
> > > > IMA support for mutable files makes IMA a lot more complicated.  Any
> > > > improvement suggestions would be appreciated.
> > > > 
> > > > > 
> > > > > Two solutions imho:
> > > > > (1) teach stacking filesystems like overlayfs and ecryptfs to use
> > > > >     vfs_getattr_nosec() in their ->getattr() implementation when they
> > > > >     are themselves called via vfs_getattr_nosec(). This will fix this by
> > > > >     not triggering another LSM hook.
> > > > > (2) make all ->getattr() LSM hooks PF_EXITING safe ideally don't do
> > > > >     anything
> > > > 
> > > > The original problem was detecting i_version change on overlayfs.
> > > > 
> > > > Amir's proposed patch might resolve it without commit db1d1e8b9867
> > > > ("IMA: use vfs_getattr_nosec to get the i_version").  However, as Amir
> > > > said, it does not address the new problem introduced by it.   Assuming
> > > > Amir's proposed patch resolves the original problem, an alternative
> > > > solution would be to revert commit db1d1e8b9867.
> > > > 
> > > 
> > > If you're going to revert that commit, then I'm wondering what you
> > > intend to do instead. Reaching directly into the inode to get this
> > > information is really no bueno.
> > 
> > IMA detecting file change based on i_version has been there since IMA
> > was upstreamed.   Please explain why this is not a good idea.
> > 
> 
> Not all i_version values are managed in the same way. Network
> filesystems need to pass through the value from the server, whereas with
> a local filesystems the kernel needs to manage the increment.
> 
> IMA is mostly interested in local filesystems at the moment. The main
> kernel-managed versions in the kernel are in btrfs, ext4 and xfs and
> tmpfs. Until recently, only btrfs had one that functioned properly. Both
> ext4 and xfs would also increment their values on atime updates. In
> practical terms, this means that IMA ends up doing unnecessary
> remeasurements after read events in some cases.
> 
> ext4 recently had its i_version value fixed to not do this, but the XFS
> developers are unable to fix theirs to avoid incrementing on atime
> updates. For that, I'm working on the multigrain ctime patches which
> should allow XFS to fake up a STATX_CHANGE_COOKIE in its getattr
> routine.
> 
> IMA has no practical way to tell what the filesystem can do if it's
> groveling around inside struct inode, which is why I recommended using
> vfs_getattr_nosec to grab this info. If that's problematic then by all
> means, back out that patch, but you'll need to come up with some way to
> deal with the different nuances of the different i_version counters in
> across different filesystems.

Got it.  This is basically a performance issue, because i_version is
being updated too frequently on some filesystems.  It's not an issue of
missing measurements or not re-evaluting the file's integrity when
needed.

Let's see if Amir's patch actually fixes the original problem before
making any decisions.  (Wishing for a reproducer of the original
problem.)

Mimi




[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux