Re: [RFC PATCH] lsm: add the inode_free_security_rcu() LSM implementation hook

Matus Jokay <matus.jokay@xxxxxxxx> · Tue, 23 Jul 2024 11:27:26 +0200

On 22. 7. 2024 21:46, Paul Moore wrote:
> On Mon, Jul 22, 2024 at 8:30 AM Matus Jokay <matus.jokay@xxxxxxxx> wrote:
>> On 10. 7. 2024 12:40, Mickaël Salaün wrote:
>>> On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote:
>>>> The LSM framework has an existing inode_free_security() hook which
>>>> is used by LSMs that manage state associated with an inode, but
>>>> due to the use of RCU to protect the inode, special care must be
>>>> taken to ensure that the LSMs do not fully release the inode state
>>>> until it is safe from a RCU perspective.
>>>>
>>>> This patch implements a new inode_free_security_rcu() implementation
>>>> hook which is called when it is safe to free the LSM's internal inode
>>>> state.  Unfortunately, this new hook does not have access to the inode
>>>> itself as it may already be released, so the existing
>>>> inode_free_security() hook is retained for those LSMs which require
>>>> access to the inode.
>>>>
>>>> Signed-off-by: Paul Moore <paul@xxxxxxxxxxxxxx>
>>>
>>> I like this new hook.  It is definitely safer than the current approach.
>>>
>>> To make it more consistent, I think we should also rename
>>> security_inode_free() to security_inode_put() to highlight the fact that
>>> LSM implementations should not free potential pointers in this blob
>>> because they could still be dereferenced in a path walk.
>>>
>>>> ---
>>>>  include/linux/lsm_hook_defs.h     |  1 +
>>>>  security/integrity/ima/ima.h      |  2 +-
>>>>  security/integrity/ima/ima_iint.c | 20 ++++++++------------
>>>>  security/integrity/ima/ima_main.c |  2 +-
>>>>  security/landlock/fs.c            |  9 ++++++---
>>>>  security/security.c               | 26 +++++++++++++-------------
>>>>  6 files changed, 30 insertions(+), 30 deletions(-)
> 
> ...
> 
>> Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC:
>>
>> 1) How does this patch close [1]?
>>    As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode."
>>    Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission,
>>    i.e. referencing the inode in a VFS path walk while destroying it...
>>    Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy.
> 
> The VFS folks can likely provide a better, or perhaps a more correct
> answer, but my understanding is that during the path walk the inode is
> protected by a RCU lock which allows for multiple threads to access
> the inode simultaneously; this could result in some cases where one
> thread is destroying the inode while another is accessing it.
> Changing this would require changes to the VFS code, and I'm not sure
> why you would want to change it anyway, the performance win of using
> RCU here is likely significant.
> 
>> 2) Is there a guarantee that inode_free_by_rcu and i_callback will be called within the same RCU grace period?
> 
> I'm not an RCU expert, but I don't believe there are any guarantees
> that the inode_free_by_rcu() and the inode's own free routines are
> going to be called within the same RCU grace period (not really
> applicable as inode_free_by_rcu() isn't called *during* a grace
> period, but *after* the grace period of the associated
> security_inode_free() call).  However, this patch does not rely on
> synchronization between the inode and inode LSM free routine in
> inode_free_by_rcu(); the inode_free_by_rcu() function and the new
> inode_free_security_rcu() LSM callback does not have a pointer to the
> inode, only the inode's LSM blob.  I agree that it is a bit odd, but
> freeing the inode and inode's LSM blob independently of each other
> should not cause a problem so long as the inode is no longer in use
> (hence the RCU callbacks).

Paul, many thanks for your answer.

I will try to clarify the issue, because fsnotify was a bad example.
Here is the related code taken from v10.

void security_inode_free(struct inode *inode)
{
	call_void_hook(inode_free_security, inode);
	/*
	 * The inode may still be referenced in a path walk and
	 * a call to security_inode_permission() can be made
	 * after inode_free_security() is called. Ideally, the VFS
	 * wouldn't do this, but fixing that is a much harder
	 * job. For now, simply free the i_security via RCU, and
	 * leave the current inode->i_security pointer intact.
	 * The inode will be freed after the RCU grace period too.
	 */
	if (inode->i_security)
		call_rcu((struct rcu_head *)inode->i_security,
			 inode_free_by_rcu);
}

void __destroy_inode(struct inode *inode)
{
	BUG_ON(inode_has_buffers(inode));
	inode_detach_wb(inode);
	security_inode_free(inode);
	fsnotify_inode_delete(inode);
	locks_free_lock_context(inode);
	if (!inode->i_nlink) {
		WARN_ON(atomic_long_read(&inode->i_sb->s_remove_count) == 0);
		atomic_long_dec(&inode->i_sb->s_remove_count);
	}

#ifdef CONFIG_FS_POSIX_ACL
	if (inode->i_acl && !is_uncached_acl(inode->i_acl))
		posix_acl_release(inode->i_acl);
	if (inode->i_default_acl && !is_uncached_acl(inode->i_default_acl))
		posix_acl_release(inode->i_default_acl);
#endif
	this_cpu_dec(nr_inodes);
}

static void destroy_inode(struct inode *inode)
{
	const struct super_operations *ops = inode->i_sb->s_op;

	BUG_ON(!list_empty(&inode->i_lru));
	__destroy_inode(inode);
	if (ops->destroy_inode) {
		ops->destroy_inode(inode);
		if (!ops->free_inode)
			return;
	}
	inode->free_inode = ops->free_inode;
	call_rcu(&inode->i_rcu, i_callback);
}

Yes, inode_free_by_rcu() is being called after the grace period of the associated
security_inode_free(). i_callback() is also called after the grace period, but is it
always the same grace period as in the case of inode_free_by_rcu()? If not in general,
maybe it could be a problem. Explanation below.

If there is a function call leading to the end of the grace period between
call_rcu(inode_free_by_rcu) and call_rcu(i_callback) (by reaching a CPU quiescent state
or another mechanism?), there will be a small time window, when the inode security
context is released, but the inode itself not, because call_rcu(i_callback) was not called
yet. So in that case each access to inode security blob leads to UAF.

For example, see invoking ops->destroy_inode() after call_rcu(inode_free_by_rcu) but
*before* call_rcu(i_callback). If destroy_inode() may sleep, can be reached end of the
grace period? destroy_inode() is *before* call_rcu(i_callback), therefore simultaneous
access to the inode during path lookup may be performed. Note: I use destroy_inode() as
*an example* of the idea. I'm not expert at all in fsnotify, posix ACL, VFS in general
and RCU, too.

In the previous message I only mentioned fsnotify, but it was only as an example.
I think that destroy_inode() is a better example of the idea I wanted to express.

I repeat that I'm aware that this RFC does not aim to solve this issue. But it can be
unpleasant to get another UAF in a production environment.

And regarding the UAF in [1], it seems very strange to me. The object managed by
Landlock was *not* dereferenced. There was access to the inode security blob itself.

static void hook_inode_free_security(struct inode *const inode)
{
	/*
	 * All inodes must already have been untied from their object by
	 * release_inode() or hook_sb_delete().
	 */
	WARN_ON_ONCE(landlock_inode(inode)->object);
}

But security blob is released at the end of the grace period related to
security_inode_free(): call_rcu(inode_free_by_rcu) is *after* invoking all registered
inode_free_security hooks.

The only place of releasing inode security blob I see in inode_free_by_rcu(). Thus,
I think, there was another call of __destroy_inode(). Or general protection fault was
not caused by UAF. Any ideas? Can someone explain it? I don't understand what and *how*
happened.

If Landlock had dereferenced the object it manages, this RFC could be the right one (if
it were a dereference from a fast path lookup, of course).

[1] https://lore.kernel.org/all/00000000000076ba3b0617f65cc8@xxxxxxxxxx/

> 
>>    If not, can the security context be released earlier than the inode itself?
> 
> Possibly, but it should only happen after the inode is no longer in
> use (the call_rcu () callback should ensure that we are past all of
> the currently executing RCU critical sections).
> 
>> If yes, can be executed
>>    inode_permission concurrently, leading to UAF of inode security context in security_inode_permission?
> 
> I do not believe so, see the discussion above, but I welcome any corrections.
> 
>>    Can fsnotify affect this (leading to different RCU grace periods)? (Again probably a question for VFS people.)
> 
> If fsnotify is affecting this negatively then I suspect that is a
> reason for much larger concern as I believe that would indicate a
> problem with fsnotify and the inode locking scheme.
>