On Wed, May 31, 2006 at 08:24:18PM -0700, Valerie Henson wrote: > Actually, the continuation inode is in B. When we create a link in > directory A to file C, a continuation inode for directory A is created > in domain B, and a block containing the link to file C is allocated > inside domain B as well. So there is no continuation inode in domain > A. > > That being said, this idea is at the hand-waving stage and probably > has many other (hopefully non-fatal) flaws. Thanks for taking a look! OK, so we really have two kinds of continuation inodes, and it might be sensible to name them differently. We have "here's some extra data for that inode over there" and "here's a hardlink from another domain". I dub the first one a 'continuation inode' and the second a 'shadow inode'. Continuation inodes and shadow inodes both suffer from the problem that they might be unwittingly orphaned, unless they have some kind of back-link to their referrer. That seems more soluble though. The domain B minifsck can check to see if the backlinked inode or directory is still there. If the domain A minifsck prunes something which has a link to domain B, it should be able to just remove the continuation/shadow inode there, without fscking domain B. Another advantage to this is that inodes never refer to blocks outside their zone, so we can forget about all this '64-bit block number' crap. We don't even need 64-bit inode numbers -- we can use special direntries for shadow inodes, and inodes which refer to continuation inodes need a new encoding scheme anyway. Normal inodes would remain 32-bit and refer to the local domain, and shadow/continuation inode numbers would be 32-bits of domain, plus 32-bits of inode within that domain. So I like this ;-) > > Surely XFS must have a more elegant solution than this? > > val@goober:/usr/src/linux-2.6.16.19$ wc -l `find fs/xfs/ -type f` > [snip] > 109083 total Well, yes. I think that inside the Linux XFS implementation there's a small and neat filesystem struggling to get out. Once SGI finally dies, perhaps we can rip out all the CXFS stubs and IRIX combatability. Then we might be able to see it. For fun, if you're a masochist, try to follow the code flow for something easy like fsync(). const struct file_operations xfs_file_operations = { .fsync = xfs_file_fsync, } xfs_file_fsync(struct file *filp, struct dentry *dentry, int datasync) { struct inode *inode = dentry->d_inode; vnode_t *vp = vn_from_inode(inode); int error; int flags = FSYNC_WAIT; if (datasync) flags |= FSYNC_DATA; VOP_FSYNC(vp, flags, NULL, (xfs_off_t)0, (xfs_off_t)-1, error); return -error; } #define _VOP_(op, vp) (*((vnodeops_t *)(vp)->v_fops)->op) #define VOP_FSYNC(vp,f,cr,b,e,rv) \ rv = _VOP_(vop_fsync, vp)((vp)->v_fbhv,f,cr,b,e) vnodeops_t xfs_vnodeops = { .vop_fsync = xfs_fsync, } Finally, xfs_fsync actually does the work. The best bit about all this abstraction is that there's only one xfs_vnodeops defined! So this could all be done with an xfs_file_fsync() that munged its parameters and called xfs_fsync() directly. That wouldn't even affect IRIX combatability, but it would make life difficult for CXFS, apparently. http://oss.sgi.com/projects/xfs/mail_archive/200308/msg00214.html - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html