Re: [PATCH v10 18/27] integrity/ima: Define ns_status for storing namespaced iint data

Stefan Berger <stefanb@xxxxxxxxxxxxx> · Wed, 23 Feb 2022 21:49:35 -0500

On 2/23/22 21:21, Stefan Berger wrote:

On 2/23/22 11:12, Mimi Zohar wrote:
On Tue, 2022-02-01 at 15:37 -0500, Stefan Berger wrote:
From: Mehmet Kayaalp <mkayaalp@xxxxxxxxxxxxxxxxxx>

Add an rbtree to the IMA namespace structure that stores a namespaced
version of iint->flags in ns_status struct. Similar to the
integrity_iint_cache, both the iint and ns_status are looked up 
using the
inode pointer value. The lookup, allocate, and insertion code is also
similar.

In subsequent patches we will have to find all ns_status entries an 
iint
is being used in and reset flags there. To do this, connect a list of
ns_status to the integrity_iint_cache and provide a reader-writer
lock in the integrity_iint_cache to lock access to the list.

To simplify the code in the non-namespaces case embed an ns_status in
the integrity_iint_cache and have it linked into the iint's 
ns_status list
when calling ima_get_ns_status().

When getting an ns_status first try to find it in the RB tree. Here 
we can
run into the situation that an ns_status found in the RB tree has a
different iint associated with it for the same inode. In this case 
we need
to delete the ns_status structure and get a new one.

There are two cases for freeing:
- when the iint is freed (inode deletion): Walk the list of ns_status
   entries and disconnect each ns_status from the list; take the
   writer lock to protect access to the list; also, take the item 
off the
   per-namespace rbtree

- when the ima_namepace is freed: While walking the rbtree, remove the
   ns_status from the list while also holding the iint's writer lock;
   to be able to grab the lock we have to have a pointer to the iint on
   the ns_status structure.

To avoid an ns_status to be freed by the two cases concurrently, 
prevent
these two cases to run concurrently. Therefore, groups of threads
deleting either inodes or ima_namespaces are allowed to run 
concurrently
but no two threads may run and one delete an inode and the other an
ima_namespace.
The locking involved here is really complex.  I'm sure you thought
about it a lot, but isn't there a better alternative?

I am afraid this is a difficult question and a short and concise 
answer is not possible...

The complexity of the locking is driven by concurrency and the data 
structures that are involved. The data structures (existing global 
iint rbtree, ns_status structure, and per namespace rbtree for 
ns_status) and how they are organized and connected (via linked lists) 
are a consequence of the fact that we need to be able to handle files 
shared between IMA namespaces (and the host) so that re-auditing, 
re-measuring and re-appraisal of files after file modifications or 
modifications of the security.ima xattr (by any namespaces) can be 
done efficiently. Furthermore, it helps to efficiently remove all the 
status information that an IMA namespace has kept for files it 
audited/measured/appraised. The goal was to make this as scalable as 
possible by having each namespace get out of the way of other 
namespaces by preventing them from locking each other out too much. 
The single biggest problem are files shared between IMA namespaces.

The best argument for the design I can come up with is the 'Big O 
notation' describing the time complexity of operations.

The existing global iint rbtree maintains IMA status information for 
each inode. Lookup and insertion of data into the gloab iint rbtree  
is O(log(n)), thus optimal.

To accommodate re-auditing/re-measurement/re-appraisal, which is 
driven by resetting status flags, I connected a list of ns_status 
structures, in which each namespace maintains its status information 
for each inode, to the iint maintained in that global rbtree. The 
resetting of status flags is fast because traversing the list after a 
lookup in the tree is O(n). Lookup + resetting the flags therefore is 
O(log(n) + n). If the list didn't exist we would have to search all 
IMA namespaces for the inode to be able to reset the flags, resulting 
in O(n * log(n)) time complexity, which is of course much worse. So, 
the list of ns_status linked to an iint has a good reason: better time 
complexity to traverse the list and reset status flags. Beside  that 
it also supports fast handling of deletion of files where the iint has 
to be delete from the global rbtree and the ns_status list it holds 
must also be deleted (each ns_status also needs to be delete from a 
per IMA-namespace rbtree then)

There's also a per-IMA namespace rbtree for each inode that serves two 
purposes:

a) Fast lookup of ns_status (O(log(n)) for an IMA namespace; at least 
to insert an ns_status into this tree we need not write-lock the iint 
tree but the initial iint creation required the write-locking of the 
iint tree

b) Maintaining a collection of inodes that the namespace has 
audited/measured/appraised for efficient deletion upon IMA namespace 
teardown: We can traverse this tree in O(n) time and determine which 
iints have no more namespace users and delete them from the iint tree.

Now the dilemma with this is that an ns_status structure is connected 
to a list hanging off the iint and on top of this it is part of an 
rbtree. And this is where the 'group locking' is coming from. What we 
need to prevent is that an ns_status is deleted from its iint list 
(when a file is deleted) while it is also deleted from the per-IMA 
namespace rbtree (when the namespace is deleted). Both must not be 
done concurrently. What is possible is that a group of threads may 
tear down namespaces and the other group may act on file deletion, but 
threads from both groups must not run concurrently.

Now we can at least look at two alternatives for the per-IMA namespace 
rbtree.

1) One alternative is to use a list instead of an rbtree. We would 
loose the fast lookup via the per IMA namespace tree and get O(n) 
lookup times but quick insertion into the list [O(1)]. We still would 
have the collection of inodes. And we would still have the dilemma 
that an ns_status would be connected to two lists, thus requiring the 
group locking. I don't think using a list instead of an rbtree is a 
solution.

2) We could try to get rid of the per-IMA namespace rbtree altogether 
and just use the global iint rbtree that exists today with a list of 
ns_status connected to its iints. If we do this we would loose the 
knowledge of which inodes a namespace has an ns_status structure for. 
The only way we would find this is by traversing the global iint tree 
(O(n)) and follow each iint list (O(m)) to see whether we find an 
ns_status holding information about the iint. The time complexity for 
this would be O(n*m) but much less than O(n^2). A downside would also 
be that we would have to keep a lock on the global iint rbtree while 
traversing it, thus locking out those that want to add inodes to the 
tree. On the upside it would allow us to get rid of the group locking. 
Lookup of an ns_status in the global iint tree would be O(n) + O(m) 
and insertion would be O(n) + O(1).

Certainly, the alternative is 2) with its own trade-offs. My guess is 
some sort of yielding could probably also be helpful there then to 
avoid blocking higher priority operations than deleting of a namespace.

I forgot to mention: It makes a difference if one has to walk the global 
iint tree to find the few ns_status for the namespace among possibly 
thousands of entries in that tree than having a per-IMA namespace rbtree 
that has these few ns_status right there. So walking the iint tree is 
more like O(N) versus O(n) walking the per-IMA namespace rbtree.