Re: [RFC] readdir mess

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Wed, 13 Aug 2008 17:19:40 +0100

On Wed, Aug 13, 2008 at 01:36:35AM -0700, Brad Boyer wrote:
> On Wed, Aug 13, 2008 at 01:04:33AM +0100, Al Viro wrote:
> > hfs:	at least missing checks for hfs_bnode_read() failure.  And I'm not
> > 	at all sure that hfs_mac2asc() use is safe there.  BTW, open_dir_list
> > 	handling appears to be odd - how the hell does decrementing ->f_pos
> > 	help anything?  And hfs_dir_release() removes from list without any
> > 	locks, so that's almost certainly racy as well.
> > hfsplus: ditto
> 
> I don't work on this code anymore, but I wrote the original version which
> makes me a bit curious about some of the critcism here. The function
> hfs_bnode_read is declared void, so it doesn't return any errors. The
> only thing inside it that I could even think of failing is kmap, but
> I was under the impression that kmap would sleep until it could
> complete. The actual disk read happens earlier and is saved in the
> page cache as long as the bnode object exists.

Argh... s/failure/arguments/; sorry about the braino.  Take a look at
the call in the main loop.  entrylength comes from 16bit on-disk value
(set in hfs_brec_goto()).  It's not checked anywhere for being too large,
AFAICS.  And we proceed to do memcpy() to entry.  On stack, BTW.

> Is there any reason that hfs_mac2asc would not be safe? I can't think
> of any good way to avoid that call even if it is unsafe, since the
> readdir will expect UTF8 strings rather than the mac (or UTF16 for
> hfsplus) encoding found on disk.

As for mac2asc...  Are multibyte encodings possible there?  If they are,
you'd need to validate the first byte of CName as well - result of conversion
will fit the strbuf, but that doesn't mean we do not overrun the source
buffer...

> The open_dir_list stuff is a little odd I admit, and I think you are
> right about the locking issue in release. However, I feel like I should
> explain the way hfs and hfsplus use f_pos on directories. The on-disk
> format requires that the directory entries be sorted in order by
> filename and does not allow any holes in the list. Because of this,
> adding and removing entries will move all the items that are later
> in the order to make room or eliminate the hole. The manipulation
> of f_pos is intended to make it act more like a standard unix
> filesystem where removing an item doesn't usually make a pending
> readdir skip an unrelated item. The value of f_pos for a directory
> is only incremented by one for each entry to make seeking work
> in a more sane fashion.

What happens if you repeatedly create and remove an entry with name below
that of the place where readdir has stopped?  AFAICS, on each iteration
f_pos will decrement...  I see that scanning of the list in hfs_cat_delete()
and nowhere else; we don't have the matching increment of f_pos...

> Because of this, an increment moves to the
> next item and decrement moves to the previous one.
> 
> As a side note about the complexity of making hfs and hfsplus fit
> into a unix model, there is just one file that contains the equivalent
> of every directory and the entire inode table. Because of this, the
> directory code is very synthetic. I tried to make it look as much as
> possible like a normal unix filesystem, including making i_nlink on
> the directory inodes reflect the number of child directories. It
> makes for some code that is admittedly a little hard to follow.

It's actually fairly readable, but AFAICS doesn't validate the on-disk
data enough...  Sure, don't go around mounting corrupt filesystem images
and all such, but getting buffer overruns on kernel stack is a bit over
the top, IMO...

[que the grsec pack popping out of latrines screaming "coverup" and demanding
CVEs to be allocated]
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html