[RFC][PATCH] ensure i_ino uniqueness in filesystems without permanent inode numbers (via pointer conversion)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Here's a completely different approach to ensuring inode uniqueness.
This one was inspired by a suggestion by Al Viro. I'll refer to my
earlier email for a description of the problem...

We already have what could be considered a unique number for each inode
-- the inode pointer address. The problem is converting that into an
i_ino value.

With this patch, when new_inode is called, we pretend that all of the
kernel memory is one huge array of inode pointers, and determine what
the position of the pointer would be in the array. We then take that
value, and mask off anything higher than 32 bits. Obviously this is a
much cheaper operation than keeping track of what's been allocated.

Since we're masking off the high bits, we have a chance for collisions
when those bits become significant. On my x86_64 FC6 machine, an inode
struct is 720 bytes according to slabinfo. The next lowest power of two
is 512 (2^9), so we automatically get 9 bits for "free". So this scheme
can cope with any situation where two inode addresses are not more than
2^41 (2 petabytes) apart.

This calculation was done quickly, so I might be off by one
exponentially, but still I think we'd probably be OK for the next
several years with this scheme. inode structs are smaller on 32 bit
boxes, but they won't have 64-bit pointers so this won't be an issue
there.

There are a couple of problems, but I think this patch should address
them too:

1) because the slab allocator tends to reuse slab objects quickly,
i_ino's get reused quickly. The patch copes with this by removing the
initialization of i_generation from alloc_inode, and having new_inode
increment that value. This should make sure that when an inode slab
object is reused that it at least has a different i_generation than
before (barring major page allocation/release churn in the slab). There
may be callers of new_inode that assume that the i_generation they get
is 0. They'll need to be fixed with this scheme, but that should be
fairly easy.

2) this scheme would effectively leak inode addresses into userspace.
I'm not sure if that would be exploitable, but it's probably best not to
do it. The patch adds a static unsigned int that is initialized to a
random value at boot time. We'll xor the inode offset with this value.
That should allow for a unique i_ino value, but since the xor mask would
be secret, it shouldn't be possible to turn it back into an address.
There may be a more secure way to do this. I'm definitely open to
suggestions here.

Again, patch is still a little rough but this one shouldn't need much
work if it looks good. Comments, thoughts, suggestions appreciated.

Thanks,
Jeff

--- linux-2.6.18.noarch/fs/inode.c.ino2uint
+++ linux-2.6.18.noarch/fs/inode.c
@@ -22,6 +22,7 @@
 #include <linux/bootmem.h>
 #include <linux/inotify.h>
 #include <linux/mount.h>
+#include <linux/random.h>
 
 /*
  * This is needed for the following functions:
@@ -98,6 +99,15 @@ static DEFINE_MUTEX(iprune_mutex);
 struct inodes_stat_t inodes_stat;
 
 static kmem_cache_t * inode_cachep __read_mostly;
+static unsigned int inode_xor_mask;
+
+/* convert an inode address into an unsigned int and xor it with a random value
+ * determined at boot time */
+static inline unsigned int inode_to_uint (struct inode *inode)
+{
+	return ((((unsigned long) (inode - (struct inode *) 0))
+		 ^ inode_xor_mask) & 0xffffffff);
+}
 
 static struct inode *alloc_inode(struct super_block *sb)
 {
@@ -125,7 +135,6 @@ static struct inode *alloc_inode(struct 
 		inode->i_size = 0;
 		inode->i_blocks = 0;
 		inode->i_bytes = 0;
-		inode->i_generation = 0;
 #ifdef CONFIG_QUOTA
 		memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
 #endif
@@ -546,7 +555,6 @@ repeat:
  */
 struct inode *new_inode(struct super_block *sb)
 {
-	static unsigned long last_ino;
 	struct inode * inode;
 
 	spin_lock_prefetch(&inode_lock);
@@ -557,7 +565,8 @@ struct inode *new_inode(struct super_blo
 		inodes_stat.nr_inodes++;
 		list_add(&inode->i_list, &inode_in_use);
 		list_add(&inode->i_sb_list, &sb->s_inodes);
-		inode->i_ino = ++last_ino;
+		inode->i_ino = inode_to_uint(inode);
+		inode->i_generation++;
 		inode->i_state = 0;
 		spin_unlock(&inode_lock);
 	}
@@ -1393,6 +1402,9 @@ void __init inode_init(unsigned long mem
 
 	for (loop = 0; loop < (1 << i_hash_shift); loop++)
 		INIT_HLIST_HEAD(&inode_hashtable[loop]);
+
+	/* initialize the xor mask for unique inode generation */
+	get_random_bytes(&inode_xor_mask, sizeof(inode_xor_mask));
 }
 
 void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev)


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux