On 22. 7. 2024 21:46, Paul Moore wrote: > On Mon, Jul 22, 2024 at 8:30 AM Matus Jokay <matus.jokay@xxxxxxxx> wrote: >> On 10. 7. 2024 12:40, Mickaël Salaün wrote: >>> On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote: >>>> The LSM framework has an existing inode_free_security() hook which >>>> is used by LSMs that manage state associated with an inode, but >>>> due to the use of RCU to protect the inode, special care must be >>>> taken to ensure that the LSMs do not fully release the inode state >>>> until it is safe from a RCU perspective. >>>> >>>> This patch implements a new inode_free_security_rcu() implementation >>>> hook which is called when it is safe to free the LSM's internal inode >>>> state. Unfortunately, this new hook does not have access to the inode >>>> itself as it may already be released, so the existing >>>> inode_free_security() hook is retained for those LSMs which require >>>> access to the inode. >>>> >>>> Signed-off-by: Paul Moore <paul@xxxxxxxxxxxxxx> >>> >>> I like this new hook. It is definitely safer than the current approach. >>> >>> To make it more consistent, I think we should also rename >>> security_inode_free() to security_inode_put() to highlight the fact that >>> LSM implementations should not free potential pointers in this blob >>> because they could still be dereferenced in a path walk. >>> >>>> --- >>>> include/linux/lsm_hook_defs.h | 1 + >>>> security/integrity/ima/ima.h | 2 +- >>>> security/integrity/ima/ima_iint.c | 20 ++++++++------------ >>>> security/integrity/ima/ima_main.c | 2 +- >>>> security/landlock/fs.c | 9 ++++++--- >>>> security/security.c | 26 +++++++++++++------------- >>>> 6 files changed, 30 insertions(+), 30 deletions(-) > > ... > >> Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC: >> >> 1) How does this patch close [1]? >> As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode." >> Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission, >> i.e. referencing the inode in a VFS path walk while destroying it... >> Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy. > > The VFS folks can likely provide a better, or perhaps a more correct > answer, but my understanding is that during the path walk the inode is > protected by a RCU lock which allows for multiple threads to access > the inode simultaneously; this could result in some cases where one > thread is destroying the inode while another is accessing it. > Changing this would require changes to the VFS code, and I'm not sure > why you would want to change it anyway, the performance win of using > RCU here is likely significant. > >> 2) Is there a guarantee that inode_free_by_rcu and i_callback will be called within the same RCU grace period? > > I'm not an RCU expert, but I don't believe there are any guarantees > that the inode_free_by_rcu() and the inode's own free routines are > going to be called within the same RCU grace period (not really > applicable as inode_free_by_rcu() isn't called *during* a grace > period, but *after* the grace period of the associated > security_inode_free() call). However, this patch does not rely on > synchronization between the inode and inode LSM free routine in > inode_free_by_rcu(); the inode_free_by_rcu() function and the new > inode_free_security_rcu() LSM callback does not have a pointer to the > inode, only the inode's LSM blob. I agree that it is a bit odd, but > freeing the inode and inode's LSM blob independently of each other > should not cause a problem so long as the inode is no longer in use > (hence the RCU callbacks). Paul, many thanks for your answer. I will try to clarify the issue, because fsnotify was a bad example. Here is the related code taken from v10. void security_inode_free(struct inode *inode) { call_void_hook(inode_free_security, inode); /* * The inode may still be referenced in a path walk and * a call to security_inode_permission() can be made * after inode_free_security() is called. Ideally, the VFS * wouldn't do this, but fixing that is a much harder * job. For now, simply free the i_security via RCU, and * leave the current inode->i_security pointer intact. * The inode will be freed after the RCU grace period too. */ if (inode->i_security) call_rcu((struct rcu_head *)inode->i_security, inode_free_by_rcu); } void __destroy_inode(struct inode *inode) { BUG_ON(inode_has_buffers(inode)); inode_detach_wb(inode); security_inode_free(inode); fsnotify_inode_delete(inode); locks_free_lock_context(inode); if (!inode->i_nlink) { WARN_ON(atomic_long_read(&inode->i_sb->s_remove_count) == 0); atomic_long_dec(&inode->i_sb->s_remove_count); } #ifdef CONFIG_FS_POSIX_ACL if (inode->i_acl && !is_uncached_acl(inode->i_acl)) posix_acl_release(inode->i_acl); if (inode->i_default_acl && !is_uncached_acl(inode->i_default_acl)) posix_acl_release(inode->i_default_acl); #endif this_cpu_dec(nr_inodes); } static void destroy_inode(struct inode *inode) { const struct super_operations *ops = inode->i_sb->s_op; BUG_ON(!list_empty(&inode->i_lru)); __destroy_inode(inode); if (ops->destroy_inode) { ops->destroy_inode(inode); if (!ops->free_inode) return; } inode->free_inode = ops->free_inode; call_rcu(&inode->i_rcu, i_callback); } Yes, inode_free_by_rcu() is being called after the grace period of the associated security_inode_free(). i_callback() is also called after the grace period, but is it always the same grace period as in the case of inode_free_by_rcu()? If not in general, maybe it could be a problem. Explanation below. If there is a function call leading to the end of the grace period between call_rcu(inode_free_by_rcu) and call_rcu(i_callback) (by reaching a CPU quiescent state or another mechanism?), there will be a small time window, when the inode security context is released, but the inode itself not, because call_rcu(i_callback) was not called yet. So in that case each access to inode security blob leads to UAF. For example, see invoking ops->destroy_inode() after call_rcu(inode_free_by_rcu) but *before* call_rcu(i_callback). If destroy_inode() may sleep, can be reached end of the grace period? destroy_inode() is *before* call_rcu(i_callback), therefore simultaneous access to the inode during path lookup may be performed. Note: I use destroy_inode() as *an example* of the idea. I'm not expert at all in fsnotify, posix ACL, VFS in general and RCU, too. In the previous message I only mentioned fsnotify, but it was only as an example. I think that destroy_inode() is a better example of the idea I wanted to express. I repeat that I'm aware that this RFC does not aim to solve this issue. But it can be unpleasant to get another UAF in a production environment. And regarding the UAF in [1], it seems very strange to me. The object managed by Landlock was *not* dereferenced. There was access to the inode security blob itself. static void hook_inode_free_security(struct inode *const inode) { /* * All inodes must already have been untied from their object by * release_inode() or hook_sb_delete(). */ WARN_ON_ONCE(landlock_inode(inode)->object); } But security blob is released at the end of the grace period related to security_inode_free(): call_rcu(inode_free_by_rcu) is *after* invoking all registered inode_free_security hooks. The only place of releasing inode security blob I see in inode_free_by_rcu(). Thus, I think, there was another call of __destroy_inode(). Or general protection fault was not caused by UAF. Any ideas? Can someone explain it? I don't understand what and *how* happened. If Landlock had dereferenced the object it manages, this RFC could be the right one (if it were a dereference from a fast path lookup, of course). [1] https://lore.kernel.org/all/00000000000076ba3b0617f65cc8@xxxxxxxxxx/ > >> If not, can the security context be released earlier than the inode itself? > > Possibly, but it should only happen after the inode is no longer in > use (the call_rcu () callback should ensure that we are past all of > the currently executing RCU critical sections). > >> If yes, can be executed >> inode_permission concurrently, leading to UAF of inode security context in security_inode_permission? > > I do not believe so, see the discussion above, but I welcome any corrections. > >> Can fsnotify affect this (leading to different RCU grace periods)? (Again probably a question for VFS people.) > > If fsnotify is affecting this negatively then I suspect that is a > reason for much larger concern as I believe that would indicate a > problem with fsnotify and the inode locking scheme. >