On Wed, Oct 17, 2012 at 12:40 PM, Casey Bodley <casey@xxxxxxxxxxxx> wrote: > To expand on what Matt said, we're also trying to address this issue of lookups by inode number for use with NFS. > > The design we've been exploring is to create a single system inode, designated the 'inode container' directory, which stores the primary links to all inodes in the filesystem. These links are named by their inode number to satisfy lookups and obviate the need for an anchor table. This design allows the inode container to make use of existing directory fragmentation and load balancing to distribute the inodes over the MDS cluster. > > When a new file is created, it then adds two links: a primary link into the inode container, and a remote link into the filesystem namespace. In the case where the parent directory fragment's authority is different than the corresponding inode container fragment's, it is created in the parent directory then exported to the inode container via an asynchronous slave request. > > We welcome additional discussion, both on this design specifically and on the general topic of scalable ino lookups. So if the primary link isn't always in the "inode container", you must be preserving the anchor table for this setup. Am I understanding that correctly? Or is there some other mechanism for linking them that's less expensive? -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html