Sorry, I keep putting off a reply until I can append a patch, but still won't get to do so today. Comments below. On Wed, 30 Apr 2008, Andrew Morton wrote: > On Mon, 21 Apr 2008 02:50:42 -0400 > Erez Zadok <ezk@xxxxxxxxxxxxx> wrote: > > > > > 1. remove the 3rd arg to fsstack_copy_attr_all. There are no users for it: > > ecryptfs never used the 3rd arg; unionfs stopped using it a long time > > ago. Halcrow ok'ed this patch some time ago. > > > > 2. add necessary locking for 32-bit smp systems in fsstack_copy_inode_size > > (courtesy Hugh Dickins). > > > > 3. minor commenting style changes, and addition of copyrights which were > > missing. > > > > Acked-by: Mike Halcrow <mhalcrow@xxxxxxxxxx> > > Signed-off-by: Erez Zadok <ezk@xxxxxxxxxxxxx> > > > > ... > > > > void fsstack_copy_inode_size(struct inode *dst, const struct inode *src) > > { > > - i_size_write(dst, i_size_read((struct inode *)src)); > > +#if BITS_PER_LONG == 32 && defined(CONFIG_SMP) > > + spin_lock(&dst->i_lock); > > +#endif > > The defined(CONFIG_SMP) is wrong. The spinlock is here to protect > dst->i_blocks, but it can be corrupted via preemption on uniprocessor as > well. So a plain old > > #if BITS_PER_LONG == 32 > > would fix that. Some confusion here - which would have been avoided if I'd been so good as to put much-needed comment in the code of the original patch - sorry. There's two separate issues, i_size and i_blocks. i_size is what I was addressing by adding the #ifdef'ed spin_lock, i_blocks (in the CONFIG_LSF case more precisely than the 32-bit case) is an arguable problem that nobody noticed until you did so now. Here's my original patch comment to Erez: + LTP's iogen01 doio tests hang nicely on 32-bit SMP when /tmp is a unionfs + mount of a tmpfs. See the comment on i_size_write in linux/fs.h: it needs + to be locked, otherwise i_size_read can spin forever waiting for a lost + seqcount update. + + Most filesystems are already holding i_mutex for this, but unionfs calls + fsstack_copy_inode_size from many places, not necessarily holding i_mutex. + Use the low-level i_lock within fsstack_copy_inode_size when 32-bit SMP. So far as that goes, the CONFIG_SMP is correct, because i_size_write does preempt_disable/preempt_enable in the 32-bit PREEMPT but not SMP case. The ifdef chosen reflects that in i_size_write, but I've no objection to removing the CONFIG_SMP part of it if it ends up more palatable that way - I imagine the cost of a double preempt_disable where a single would do is just too minuscule to worry about. But the new use of i_lock, together with the way I only unlock after dealing with i_blocks (I thought, if we have to have preemption off, let's do i_blocks too while it's still off), led you to think that I was trying to keep the two halves of a CONFIG_LSF i_blocks together - whereas I'd never even glanced at the typedef of i_blocks. Personally I don't think it's worth worrying about i_blocks coherency here. stat's generic_fillattr doesn't worry about it. unionfs (ah, but here we're in general stackable filesystem territory) won't very often be handling files where the top half of the u64 is set. Quotas have a strong reason for keeping i_blocks coherent, but unionfs (and any stacking filesystem?) won't get into quotas itself. And tmpfs amongst non-quota others does not use i_lock to manipulate i_blocks. In later mail, it looks like you're wondering whether it's worth bothering about too: and perhaps you wouldn't have bothered about it in the first place, if the patch comment provided had made it clear what this change was for, i_size not i_blocks. > > > + i_size_write(dst, i_size_read(src)); > > dst->i_blocks = src->i_blocks; > > +#if BITS_PER_LONG == 32 && defined(CONFIG_SMP) > > + spin_unlock(&dst->i_lock); > > +#endif > > } > > However, what about src->i_blocks? It is protected by src->i_lock. The > code as you have it here could read transient values. In some cases it's protected by src->i_lock. > > Furthermore, i_lock is defined as an innermost lock, for protection of > inode internals. But here we're proposing "taking" inode->i_size_seqcount > inside i_lock. Not necessarily a problem, but it broke the old rule. > > We're also doing a read_seqlock of a _different_ inode inside this inode's > i_lock. Again, this is not necessarily a problem (but it might be!) but it > adds complexity and needs thought. Good points, I didn't think of those at all (I guess I don't properly think of the seqcount as a lock). There's no good reason to do that i_size_read within the i_lock, you're quite right to take it outside. > > Can we avoid having to think? Not such a good idea... > > void fsstack_copy_inode_size(struct inode *dst, const struct inode *src) > { > blkcnt_t i_blocks; > loff_t i_size; > > i_size = i_size_read(src); > spin_lock_32bit(&src->i_lock); > i_blocks = src->i_blocks; > spin_unlock_32bit(&src->i_lock); > > i_size_write(dst, i_size); > spin_lock_32bit(&dst->i_lock) > dst->i_blocks = i_blocks; > spin_unlock_32bit(&dst->i_lock) > } No. I see Erez has followed this up, and reduced it to a nice looking i_blocks_write(dst, i_blocks_read(src)); i_size_write(dst, i_size_read(src)); but I'm afraid that doesn't address the hang my patch was to fix. My inclination is to go for something like: i_size = i_size_read(src); spin_lock_32bit(&dst->i_lock) i_size_write(dst, i_size); dst->i_blocks = src->i_blocks; spin_unlock_32bit(&dst->i_lock) with spin_(un)lock_32bit declared as inlines in the very same file, unless you really want to embark on some i_blocks coherency cleanup. With comments in the code as well as in the patch description. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html