On Fri, 3 Apr 2009, Wendy Cheng wrote: > Kadlecsik Jozsef wrote: > > On Thu, 2 Apr 2009, Wendy Cheng wrote: > > > > > Kadlecsik Jozsef wrote: > > > > > > > > > > > - commit 82d176ba485f2ef049fd303b9e41868667cebbdb > > > > > > gfs_drop_inode as .drop_inode replacing .put_inode. > > > > > > .put_inode was called without holding a lock, but .drop_inode > > > > > > is called under inode_lock held. Might it be a problem > > > > > > > > > Based on code reading ... > > > 1. iput() gets inode_lock (a spin lock) > > > 2. iput() calls iput_final() > > > 3. iput_final() calls filesystem drop_inode(), followed by > > > generic_drop_inode() > > > 4. generic_drop_inode() unlock inode_lock after doing all sorts of fun > > > things > > > with the inode > > > > > > So look to me that generic_drop_inode() statement within gfs_drop_inode() > > > should be removed. Otherwise you would get double unlock and double list > > > free. > > > > I think those function calls are right: iput_final calls either the > > filesystem drop_inode function (in this case gfs_drop_inode) or > > generic_drop_inode. There's no double call of generic_drop_inode. However > > gfs_sync_page_i (and in turn filemap_fdatawrite and filemap_fdatawait) is > > now called under inode_lock held and that was not so in previous versions. > > But I'm just speculating. > > It *is* called twice unless my eyes deceive me > > static inline void iput_final(struct inode *inode) > { > const struct super_operations *op = inode->i_sb->s_op; > void (*drop)(struct inode *) = generic_drop_inode; > > if (op && op->drop_inode) > drop = op->drop_inode; /* gfs call generic_drop_inode() */ > drop(inode); /* second call into generic_drop_inode() again. */ > } No, the line 'drop = op->drop_inode;' is just an assignment (there's no function argument): if there's a filesystem-specific drop_inode function, that is assigned to 'drop', overwriting thus the originally initialized 'generic_drop_inode' value of the 'drop' variable as function pointer. And the last line is only the function call, with the proper argument. > > I won't get a chance to start a test before Monday, sorry. > > I'll be traveling next week as well. However, a few cautious words here: > > Even this "fix" eventually solves your hang, running GFS on newer > kernels with production system simply is *not* a good idea. That might be so, but this is a catch: at least we must test GFS2 before migrating from GFS1 to GFS2. That requires a recent kernel, with working GFS1 and GFS2 support. A migration of an in-production system cannot be started lightheartedly, and transforming GFS1 to GFS2 wont' be easy: that needs downtime, fresh backups created after bringing down the systems, backup verification, converting/creating the GFS2 volumes, restoring the data. And crossing fingers that it must not be undone if something goes wrong. It's not the same as just replacing an older kernel and an older package. Best regards, Jozsef -- E-mail : kadlec@xxxxxxxxxxxx, kadlec@xxxxxxxxxxxxxxxxx PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt Address: KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster