Re: Orangefs, v4.5 and the merge window...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 11, 2016 at 03:18:57PM -0500, Mike Marshall wrote:
> Greetings...
> 
> The Orangefs for-next tree is:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux.git
> for-next
> 
> I did a test merge (just locally, not pushed out) of Orangefs:for-next
> and v4.5-rc7 so I could test out how I think I need to patch for
> the follow_link -> get_link change, the diff is below.
> 
> On Monday next, assuming that v4.5 is finalized this weekend,
> I plan to do a actual merge with v4.5, apply the get_link patch
> and push that to Orangefs:for-next.
> 
> Hi Al <g>... might we get an ACK this time around?

You do realize that it will mean fun few weeks post-merge fixing the rest of
problems, right?  FWIW, I think that right now it *is* at the state where it
such fixing is feasible, so modulo that...

As far as I can see, waiting-related logics should be solid by now, ditto
for lifetime rules; sanitizing the input...  listxattr still does need fixing
(feed it a negative in ->downcall.resp.listxattr.lengths[0] and watch Bad
Things(tm) happen; no idea why would anyone go for
fs/orangefs/downcall.h:82:      __s32 lengths[ORANGEFS_MAX_XATTR_LISTLEN];
for representing string lengths in the first place, but that's what you've
got there and no sanity checks are done on it beyond
                if (total + new_op->downcall.resp.listxattr.lengths[i] > size)
                        goto done;
which is not enough - not with total and size being ssize_t and ...lengths[] -
signed 32bit).

The logics around maintaining the list of orangefs superblocks (add/remove/
traverse) needs fixing; right now ioctl(..., ORANGEFS_DEV_REMOUNT_ALL) will
walk through it with only request_mutex held.  Both insertion and removal
are protected only by orangefs_superblocks_lock, and removal is insane -
        struct list_head *tmp_safe = NULL;                              \
        struct orangefs_sb_info_s *orangefs_sb = NULL;                  \
                                                                        \
        spin_lock(&orangefs_superblocks_lock);                          \
        list_for_each_safe(tmp, tmp_safe, &orangefs_superblocks) {      \
                orangefs_sb = list_entry(tmp,                           \
                                      struct orangefs_sb_info_s,        \
                                      list);                            \
                if (orangefs_sb && (orangefs_sb->sb == sb)) {           \
                        gossip_debug(GOSSIP_SUPER_DEBUG,                \
                            "Removing SB %p from orangefs superblocks\n",      \
                        orangefs_sb);                                   \
                        list_del(&orangefs_sb->list);                   \
                        break;                                          \
                }                                                       \
        }                                                               \
        spin_unlock(&orangefs_superblocks_lock);                        \
list_entry is never NULL, for starters, and since there is a pointer back
from superblock to that orangefs_sb_info, there's no reason to walk the entire
list to find one.  BTW, both add_orangefs_sb() and remove_orangefs_sb() should
be taken to their sole users.

Sanity aside, there's really no lock in common for list modifiers and list
walker I'd mentioned above.  FWIW, I would make orangefs_remount()
take struct orangefs_sb_info instead of struct super_block and flipped the
order of operations in orangefs_kill_sb() - kill_anon_super() *first*, then
remove from the list, then tell the userland that it's going away (i.e.
call orangefs_unmount_sb()).  request_mutex in the last one would, at least,
prevent freeing the sucker before orangefs_remount() is done with it.

Walking the list and calling orangefs_remount() on everything would still need
care - you'd need to hold orangefs_superblocks_lock, drop it for actual calls
of orangefs_remount() and have list removal preserve the forward pointer.

That's probably the worst remaining locking issue I see in there.  Doable,
if not pleasant...

IIRC, there also had been some unpleasantness with getattr messing ->i_mode
halfway through... <checks>  Yes - copy_attributes_to_inode() will be called,
and do
        inode->i_mode = orangefs_inode_perms(attrs);
...
                inode->i_mode |= S_IFLNK;
...
                        strncpy(orangefs_inode->link_target,
                                symname,
                                ORANGEFS_NAME_MAX);
If nothing else, *another* stat(2) racing with this one could pick the
intermediate value of ->i_mode and proceed to report it to userland.
Another problem is overwriting the symlink body; that can get very
unpleasant, since it might be traversed by another syscall right at that
moment.  Any change of a symlink body means "we'd missed it going stale"; 
there is no way to change a symlink contents without removing it and
creating a new one.  Should anything other than orangefs_iget() even bother
copying it?  The same goes for inode type changes, of course (regular
vs. directory vs. symlink, etc.).

Speaking of orangefs_iget(), orangefs_set_inode() is pointlessly paranoid.
Not a bug per se, but
        struct orangefs_inode_s *orangefs_inode = NULL;

        /* Make sure that we have sane parameters */
        if (!data || !inode)  
                return 0;
        orangefs_inode = ORANGEFS_I(inode);
        if (!orangefs_inode)
                return 0;
is all wrong - 'data' is the last argument passed to iget5_locked (i.e. 'ref'
of orangefs_iget()) and that's always an address of either a local variable
or of a field in a large structure, and not even the first one; 'inode'
is never NULL - it's the address of struct inode the caller is about to
insert into the hash chain; ORANGEFS_I() is container_of(), so it's not
going to be NULL either.

I'll need to look through the archived threads to see if there's anything
else left; IIRC, debugfs-related code had seriously nasty issues in case of
allocation failures, but those were fairly isolated.  I'll read through the
archive tomorrow and see if there's anything else mentioned and not dealt
with; I don't remember anything really bad, but it had been well over
a hundred mails starting about half a year ago; I sure as hell do not
remember every tangential subthread in all of that, so I'll need to recheck.

I _think_ that all remaining issues can be quickly dealt with, and the code
has zero impact on the rest of the kernel.  I wouldn't risk putting it into
-final without fixups, but as for the merge schedule... either merge it
before -rc1 and fix it up by -rc3 or so, or fix it during the window and
merge at around -rc2 - I'm fine with either variant.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux