Re: [PATCH bpf-next 0/4] Make inode storage available to tracing prog

Song Liu <songliubraving@xxxxxxxx> · Thu, 21 Nov 2024 18:28:35 +0000

> On Nov 21, 2024, at 9:47 AM, Casey Schaufler <casey@xxxxxxxxxxxxxxxx> wrote:
> 
> On 11/21/2024 12:28 AM, Song Liu wrote:
>> Hi Dr. Greg,
>> 
>> Thanks for your input!
>> 
>>> On Nov 20, 2024, at 8:54 AM, Dr. Greg <greg@xxxxxxxxxxxx> wrote:
>>> 
>>> On Tue, Nov 19, 2024 at 10:14:29AM -0800, Casey Schaufler wrote:
>> [...]
>> 
>>>>> 2.) Implement key/value mapping for inode specific storage.
>>>>> 
>>>>> The key would be a sub-system specific numeric value that returns a
>>>>> pointer the sub-system uses to manage its inode specific memory for a
>>>>> particular inode.
>>>>> 
>>>>> A participating sub-system in turn uses its identifier to register an
>>>>> inode specific pointer for its sub-system.
>>>>> 
>>>>> This strategy loses O(1) lookup complexity but reduces total memory
>>>>> consumption and only imposes memory costs for inodes when a sub-system
>>>>> desires to use inode specific storage.
>>>> SELinux and Smack use an inode blob for every inode. The performance
>>>> regression boggles the mind. Not to mention the additional
>>>> complexity of managing the memory.
>>> I guess we would have to measure the performance impacts to understand
>>> their level of mind boggliness.
>>> 
>>> My first thought is that we hear a huge amount of fanfare about BPF
>>> being a game changer for tracing and network monitoring.  Given
>>> current networking speeds, if its ability to manage storage needed for
>>> it purposes are truely abysmal the industry wouldn't be finding the
>>> technology useful.
>>> 
>>> Beyond that.
>>> 
>>> As I noted above, the LSM could be an independent subscriber.  The
>>> pointer to register would come from the the kmem_cache allocator as it
>>> does now, so that cost is idempotent with the current implementation.
>>> The pointer registration would also be a single instance cost.
>>> 
>>> So the primary cost differential over the common arena model will be
>>> the complexity costs associated with lookups in a red/black tree, if
>>> we used the old IMA integrity cache as an example implementation.
>>> 
>>> As I noted above, these per inode local storage structures are complex
>>> in of themselves, including lists and locks.  If touching an inode
>>> involves locking and walking lists and the like it would seem that
>>> those performance impacts would quickly swamp an r/b lookup cost.
>> bpf local storage is designed to be an arena like solution that works
>> for multiple bpf maps (and we don't know how many of maps we need 
>> ahead of time). Therefore, we may end up doing what you suggested 
>> earlier: every LSM should use bpf inode storage. ;) I am only 90%
>> kidding.
> 
> Sorry, but that's not funny.

I didn't think this is funny. Many use cases can seriously benefit
from a _reliable_ allocator for inode attached data. 

> It's the kind of suggestion that some
> yoho takes seriously, whacks together a patch for, and gets accepted
> via the xfd887 device tree. Then everyone screams at the SELinux folks
> because of the performance impact. As I have already pointed out,
> there are serious consequences for an LSM that has a blob on every
> inode.

i_security serves this type of users pretty well. I see no reason 
to change this. At the same time, I see no reasons to block 
optimizations for other use cases because these users may get 
blamed in 2087 for a mistake by xfd887 device maintainers. 

Song