Andrew, This is a patch I've been using successfully in Unionfs for more than a year and multiple kernel versions. The origin of patch was from Hugh Dickins, who found problems with 32-bit/SMP systems. You and hch had some comments. This is the final version which I've been using. See thread of discussion starting here: <http://marc.info/?l=linux-kernel&m=121469807803867&w=2> The comments in this patch provide more details. I think this patch is ready for upstream. At the moment, the only mainline user of this code is ecryptfs. Signed-off-by: Hugh Dickins <hugh.dickins@xxxxxxxxxxxxx> Signed-off-by: Erez Zadok <ezk@xxxxxxxxxxxxx> diff --git a/fs/stack.c b/fs/stack.c index 67716f6..3d7f5c3 100644 --- a/fs/stack.c +++ b/fs/stack.c @@ -9,8 +9,54 @@ */ void fsstack_copy_inode_size(struct inode *dst, const struct inode *src) { - i_size_write(dst, i_size_read((struct inode *)src)); - dst->i_blocks = src->i_blocks; + loff_t i_size; + blkcnt_t i_blocks; + + /* + * i_size_read() includes its own seqlocking and protection from + * preemption (see include/linux/fs.h): we need nothing extra for + * that here, and prefer to avoid nesting locks than attempt to + * keep i_size and i_blocks in synch together. + */ + i_size = i_size_read(src); + + /* + * But if CONFIG_LSF (on 32-bit), we ought to make an effort to keep + * the two halves of i_blocks in synch despite SMP or PREEMPT - though + * stat's generic_fillattr() doesn't bother, and we won't be applying + * quotas (where i_blocks does become important) at the upper level. + * + * We don't actually know what locking is used at the lower level; but + * if it's a filesystem that supports quotas, it will be using i_lock + * as in inode_add_bytes(). tmpfs uses other locking, and its 32-bit + * is (just) able to exceed 2TB i_size with the aid of holes; but its + * i_blocks cannot carry into the upper long without almost 2TB swap - + * let's ignore that case. + */ + if (sizeof(i_blocks) > sizeof(long)) + spin_lock((spinlock_t *) &src->i_lock); + i_blocks = src->i_blocks; + if (sizeof(i_blocks) > sizeof(long)) + spin_unlock((spinlock_t *) &src->i_lock); + + /* + * If CONFIG_SMP on 32-bit, it's vital for fsstack_copy_inode_size() + * to hold some lock around i_size_write(), otherwise i_size_read() + * may spin forever (see include/linux/fs.h). We don't necessarily + * hold i_mutex when this is called, so take i_lock for that case. + * + * And if CONFIG_LSF (on 32-bit), continue our effort to keep the + * two halves of i_blocks in synch despite SMP or PREEMPT: use i_lock + * for that case too, and do both at once by combining the tests. + * + * There is none of this locking overhead in the 64-bit case. + */ + if (sizeof(i_size) > sizeof(long) || sizeof(i_blocks) > sizeof(long)) + spin_lock(&dst->i_lock); + i_size_write(dst, i_size); + dst->i_blocks = i_blocks; + if (sizeof(i_size) > sizeof(long) || sizeof(i_blocks) > sizeof(long)) + spin_unlock(&dst->i_lock); } EXPORT_SYMBOL_GPL(fsstack_copy_inode_size); -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html