Re: [PATCH 0/7] Initial support for user namespace owned mounts

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 23 Jul 2015 11:51:35 +1000

On Wed, Jul 22, 2015 at 01:41:00PM -0400, J. Bruce Fields wrote:
> On Wed, Jul 22, 2015 at 12:52:58PM -0400, Austin S Hemmelgarn wrote:
> > On 2015-07-22 10:09, J. Bruce Fields wrote:
> > >On Wed, Jul 22, 2015 at 05:56:40PM +1000, Dave Chinner wrote:
> > >>On Tue, Jul 21, 2015 at 01:37:21PM -0400, J. Bruce Fields wrote:
> > >>>On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote:
> > >>>So, for example, a screwed up on-disk directory structure shouldn't
> > >>>result in creating a cycle in the dcache and then deadlocking.
> > >>
> > >>Therein lies the problem: how do you detect such structural defects
> > >>without doing a full structure validation?
> > >
> > >You can prevent cycles in a graph if you can prevent adding an edge
> > >which would be part of a cycle.
> > >
> > Except if the user can write to the filesystem's backing storage (be
> > it a device or a file), and has sufficient knowledge of the on-disk
> > structures, they can create all the cycles they want in the
> > metadata. So unless the kernel builds the graph internally by
> > parsing the metadata _and_ has some way to detect that the on-disk
> > metadata has hit a cycle (which may not just involve 2 items),
> 
> Understood.  Again, see the d_ancestor call in d_splice_alias, this is
> exactly what it checks for.

But that only addresses one type of loop in one specific metadata
structure. There's plenty of other ways you could construct metadata
loops that are essentially undetected and result in either deadlock
or livelock within the filesystem code itself. e.g. just make btree
sibling pointers loop over a range of entries that have the same
index key (e.g. free space extents of the same size). If allocation
then falls into this loop, the kernel will just spin searching the
same blocks for something it will never find.  Such resource
consumption attacks are trivial to construct but extremely difficult
to detect because they exploit normal behaviour of the structure and
algorithms by mangling trusted pointers.

Of course, this sort of attack will eventually deadlock the
filesystem because it will backs up on locks held by the live locked
search. Once the filesystem is deadlocked, it can then cause sync()
calls to get stuck on the filesystem. And because sync() is a global
operation, a deadlocked filesystem in one container could cause sync
to hang in completely unrelated container....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html