Re: [PATCH 0/7] Initial support for user namespace owned mounts

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 24 Jul 2015 09:48:54 +1000

On Thu, Jul 23, 2015 at 09:19:28AM -0400, J. Bruce Fields wrote:
> On Thu, Jul 23, 2015 at 11:51:35AM +1000, Dave Chinner wrote:
> > On Wed, Jul 22, 2015 at 01:41:00PM -0400, J. Bruce Fields wrote:
> > > On Wed, Jul 22, 2015 at 12:52:58PM -0400, Austin S Hemmelgarn wrote:
> > > > On 2015-07-22 10:09, J. Bruce Fields wrote:
> > > > >On Wed, Jul 22, 2015 at 05:56:40PM +1000, Dave Chinner wrote:
> > > > >>On Tue, Jul 21, 2015 at 01:37:21PM -0400, J. Bruce Fields wrote:
> > > > >>>On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote:
> > > > >>>So, for example, a screwed up on-disk directory structure shouldn't
> > > > >>>result in creating a cycle in the dcache and then deadlocking.
> > > > >>
> > > > >>Therein lies the problem: how do you detect such structural defects
> > > > >>without doing a full structure validation?
> > > > >
> > > > >You can prevent cycles in a graph if you can prevent adding an edge
> > > > >which would be part of a cycle.
> > > > >
> > > > Except if the user can write to the filesystem's backing storage (be
> > > > it a device or a file), and has sufficient knowledge of the on-disk
> > > > structures, they can create all the cycles they want in the
> > > > metadata. So unless the kernel builds the graph internally by
> > > > parsing the metadata _and_ has some way to detect that the on-disk
> > > > metadata has hit a cycle (which may not just involve 2 items),
> > > 
> > > Understood.  Again, see the d_ancestor call in d_splice_alias, this is
> > > exactly what it checks for.
> > 
> > But that only addresses one type of loop in one specific metadata
> > structure.
> 
> Yep, agreed!
> 
> > There's plenty of other ways you could construct metadata
> > loops that are essentially undetected and result in either deadlock
> > or livelock within the filesystem code itself. e.g. just make btree
> > sibling pointers loop over a range of entries that have the same
> > index key (e.g. free space extents of the same size). If allocation
> > then falls into this loop, the kernel will just spin searching the
> > same blocks for something it will never find.  Such resource
> > consumption attacks are trivial to construct but extremely difficult
> > to detect because they exploit normal behaviour of the structure and
> > algorithms by mangling trusted pointers.
> 
> Interesting example, thanks!  I doubt this particular example would be
> *that* hard to detect?

Yes, it can be detected, but it's not as easy as it sounds because
of abstractions between tree walking and record parsing.

>  But understood that there may be lots of others.

Yeah, that's just one of many, many ways I can think of modifying
on disk structures to screw up the kernel.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html