On Tue, Jul 23, 2024 at 5:27 AM Matus Jokay <matus.jokay@xxxxxxxx> wrote: > On 22. 7. 2024 21:46, Paul Moore wrote: > > On Mon, Jul 22, 2024 at 8:30 AM Matus Jokay <matus.jokay@xxxxxxxx> wrote: > >> On 10. 7. 2024 12:40, Mickaël Salaün wrote: > >>> On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote: > >>>> The LSM framework has an existing inode_free_security() hook which > >>>> is used by LSMs that manage state associated with an inode, but > >>>> due to the use of RCU to protect the inode, special care must be > >>>> taken to ensure that the LSMs do not fully release the inode state > >>>> until it is safe from a RCU perspective. > >>>> > >>>> This patch implements a new inode_free_security_rcu() implementation > >>>> hook which is called when it is safe to free the LSM's internal inode > >>>> state. Unfortunately, this new hook does not have access to the inode > >>>> itself as it may already be released, so the existing > >>>> inode_free_security() hook is retained for those LSMs which require > >>>> access to the inode. > >>>> > >>>> Signed-off-by: Paul Moore <paul@xxxxxxxxxxxxxx> > >>> > >>> I like this new hook. It is definitely safer than the current approach. > >>> > >>> To make it more consistent, I think we should also rename > >>> security_inode_free() to security_inode_put() to highlight the fact that > >>> LSM implementations should not free potential pointers in this blob > >>> because they could still be dereferenced in a path walk. > >>> > >>>> --- > >>>> include/linux/lsm_hook_defs.h | 1 + > >>>> security/integrity/ima/ima.h | 2 +- > >>>> security/integrity/ima/ima_iint.c | 20 ++++++++------------ > >>>> security/integrity/ima/ima_main.c | 2 +- > >>>> security/landlock/fs.c | 9 ++++++--- > >>>> security/security.c | 26 +++++++++++++------------- > >>>> 6 files changed, 30 insertions(+), 30 deletions(-) > > > > ... > > > >> Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC: > >> > >> 1) How does this patch close [1]? > >> As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode." > >> Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission, > >> i.e. referencing the inode in a VFS path walk while destroying it... > >> Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy. > > > > The VFS folks can likely provide a better, or perhaps a more correct > > answer, but my understanding is that during the path walk the inode is > > protected by a RCU lock which allows for multiple threads to access > > the inode simultaneously; this could result in some cases where one > > thread is destroying the inode while another is accessing it. > > Changing this would require changes to the VFS code, and I'm not sure > > why you would want to change it anyway, the performance win of using > > RCU here is likely significant. > > > >> 2) Is there a guarantee that inode_free_by_rcu and i_callback will be called within the same RCU grace period? > > > > I'm not an RCU expert, but I don't believe there are any guarantees > > that the inode_free_by_rcu() and the inode's own free routines are > > going to be called within the same RCU grace period (not really > > applicable as inode_free_by_rcu() isn't called *during* a grace > > period, but *after* the grace period of the associated > > security_inode_free() call). However, this patch does not rely on > > synchronization between the inode and inode LSM free routine in > > inode_free_by_rcu(); the inode_free_by_rcu() function and the new > > inode_free_security_rcu() LSM callback does not have a pointer to the > > inode, only the inode's LSM blob. I agree that it is a bit odd, but > > freeing the inode and inode's LSM blob independently of each other > > should not cause a problem so long as the inode is no longer in use > > (hence the RCU callbacks). > > Paul, many thanks for your answer. > > I will try to clarify the issue, because fsnotify was a bad example. > Here is the related code taken from v10. > > void security_inode_free(struct inode *inode) > { > call_void_hook(inode_free_security, inode); > /* > * The inode may still be referenced in a path walk and > * a call to security_inode_permission() can be made > * after inode_free_security() is called. Ideally, the VFS > * wouldn't do this, but fixing that is a much harder > * job. For now, simply free the i_security via RCU, and > * leave the current inode->i_security pointer intact. > * The inode will be freed after the RCU grace period too. > */ > if (inode->i_security) > call_rcu((struct rcu_head *)inode->i_security, > inode_free_by_rcu); > } > > void __destroy_inode(struct inode *inode) > { > BUG_ON(inode_has_buffers(inode)); > inode_detach_wb(inode); > security_inode_free(inode); > fsnotify_inode_delete(inode); > locks_free_lock_context(inode); > if (!inode->i_nlink) { > WARN_ON(atomic_long_read(&inode->i_sb->s_remove_count) == 0); > atomic_long_dec(&inode->i_sb->s_remove_count); > } > > #ifdef CONFIG_FS_POSIX_ACL > if (inode->i_acl && !is_uncached_acl(inode->i_acl)) > posix_acl_release(inode->i_acl); > if (inode->i_default_acl && !is_uncached_acl(inode->i_default_acl)) > posix_acl_release(inode->i_default_acl); > #endif > this_cpu_dec(nr_inodes); > } > > static void destroy_inode(struct inode *inode) > { > const struct super_operations *ops = inode->i_sb->s_op; > > BUG_ON(!list_empty(&inode->i_lru)); > __destroy_inode(inode); > if (ops->destroy_inode) { > ops->destroy_inode(inode); > if (!ops->free_inode) > return; > } > inode->free_inode = ops->free_inode; > call_rcu(&inode->i_rcu, i_callback); > } > > Yes, inode_free_by_rcu() is being called after the grace period of the associated > security_inode_free(). i_callback() is also called after the grace period, but is it > always the same grace period as in the case of inode_free_by_rcu()? If not in general, > maybe it could be a problem. Explanation below. > > If there is a function call leading to the end of the grace period between > call_rcu(inode_free_by_rcu) and call_rcu(i_callback) (by reaching a CPU quiescent state > or another mechanism?), there will be a small time window, when the inode security > context is released, but the inode itself not, because call_rcu(i_callback) was not called > yet. So in that case each access to inode security blob leads to UAF. While it should be possible for the inode's LSM blob to be free'd prior to the inode itself, the RCU callback mechanism provided by call_rcu() should ensure that both the LSM's free routine and the inode's free routine happen at a point in time after the current RCU critical sections have lapsed and the inode is no longer being accessed. The LSM's inode_free_rcu callback can run independent of the inode's callback as it doesn't access the inode and if it does happen to run before the inode's RCU callback that should also be okay as we are past the original RCU critical sections and the inode should no longer be in use. If the inode is still in use by the time the LSM's RCU callback is triggered then there is a flaw in the inode RCU/locking/synchronization code. It is also worth mentioning that while this patch shuffles around some code at the LSM layer, the basic idea of the LSM using a RCU callback to free state associated with an inode is far from new. While that doesn't mean there isn't a problem, we have a few years of experience across a large number of systems, that would indicate this isn't a problem. > For example, see invoking ops->destroy_inode() after call_rcu(inode_free_by_rcu) but > *before* call_rcu(i_callback). If destroy_inode() may sleep, can be reached end of the > grace period? destroy_inode() is *before* call_rcu(i_callback), therefore simultaneous > access to the inode during path lookup may be performed. Note: I use destroy_inode() as > *an example* of the idea. I'm not expert at all in fsnotify, posix ACL, VFS in general > and RCU, too. > > In the previous message I only mentioned fsnotify, but it was only as an example. > I think that destroy_inode() is a better example of the idea I wanted to express. > > I repeat that I'm aware that this RFC does not aim to solve this issue. But it can be > unpleasant to get another UAF in a production environment. I'm open to there being another fix needed, or a change to this fix, but I don't believe the problems you are describing are possible. Of course it's very possible that I'm wrong, so if you are aware of an issue I would appreciate a concrete example explaining the code path and timeline between tasks A and B that would trigger the flaw ... and yes, patches are always welcome ;) -- paul-moore.com