On Sun, Oct 1, 2017 at 3:06 PM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > > Unless I misread something it was being pointed out there are some vfs > operations today on which ima writes an ima xattr as a side effect. And > those operations hold the i_sem. So perhaps I am misunderstanding > things or writing the ima xattr needs to happen at some point. Which > implies something like queued work. So the issue is indeed the inode semaphore, as it is used by IMA. But all these IMA patches to work around the issue are just horribly ugly. One adds a VFS-layer filesystem method that most filesystems end up not really needing (it's the same as the regular read), and other filesystems end up then having hacks with ("oh, I don't need to take this lock because it was already taken by the caller"). The second patch attempt avoided the need for a new filesystem method, but added a flag in an annoying place (for the same basic logic). The advantage is that now most filesystems don't actually need to care any more (and the filesystems that used to care now check that flag). There was discussion about moving the flag to a mode convenient spot, which would have made it a lot less intrusive. But the basic issue is that almost always when you see lock inversions, the problem can just be fixed by doing the locking differently instead. And that's what I was/am pushing for. There really are two totally different issues: - the integrity _measurement_. This one wants to be serialized, so that you don't have multiple concurrent measurements, and the serialization fundamentally has to be around all the IO, so this lock pretty much has to be outside the i_sem. - the integrity invalidation on certain operations. This one fundamentally had to be inside the i_sem, since some of the operations that cause this end up already holding the i_sem at a VFS layer. so you had these two different requirements (inside _and_ outside), and the IMA approach was basically to avoid the problem by making i_sem *the* lock, and then making the IO routines aware of it already being held. That does solve the inside/outside issue. But the simpler way to fix it is to simply use two locks that nest inside each other, with i_sem nesting in the middle. That just avoids the problem entirely, and doesn't require anybody to ever care about i_sem semantic changes, because i_sem semantics simply didn't change at all. So that's the approach I'm pushing. I admittedly haven't actually looked at the IMA details, but from a high-level standpoint you can basically describe it (as above) without having to care too much about exactly what IMA even wants. The two-lock approach does require that the operations that invalidate the integrity measurements always only invalidate it, and don't try to re-compute it. But I suspect that would be entirely insane anyway (imagine a world where "setxattr" would have to read the whole file contents in order to revalidate the integrity measurement - even if there is nobody who even *cares*). Linus