Andreas Dilger wrote: > Ted Ts'o wrote something like the following (didnt' get original email): >> @@ -122,6 +122,9 @@ struct ext4_inode_info { >> struct list_head i_prealloc_list; >> spinlock_t i_prealloc_lock; >> >> + /* ialloc */ >> + ext4_group_t i_last_alloc_group; > > Even better would be to store i_last_alloc_inode. In the past Eric > has demonstrated workloads that are allocating lots of inodes exhibit > O(n^2) behaviour because the entire group bitmap is searched from the > start each time, and that can cumulatively be very slow. Having the > directory start searching from the most recently allocated inode would > make this O(n), and would not significantly alter behaviour. A very hacky benchmark I had to demonstrate this is at It just creates a directory tree starting at 000/ under the dir it's run in, and times iterations of creates. The tree is created in order, like: 000/000/000/000/000/000 000/000/000/000/000/001 000/000/000/000/000/002 ... 000/000/000/000/000/fff 000/000/000/000/001/000 .... On ext3: # ./seq_mkdirs iter 0: 6.191491 sec iter 1: 8.455782 sec iter 2: 9.435375 sec iter 3: 10.198069 sec iter 4: 10.922969 sec iter 5: 10.800908 sec iter 6: 12.940676 sec iter 7: 15.513261 sec ... On upstream ext4: # ./seq_mkdirs iter 0: 5.628331 sec iter 1: 6.581043 sec iter 2: 6.723445 sec iter 3: 6.567891 sec iter 4: 5.862526 sec iter 5: 6.462064 sec iter 6: 7.208110 sec iter 7: 6.549735 sec ... I did play with saving the last-allocated position but if that's just in-memory then it's a little odd that the first allocation will be potentially much slower, but that's probably acceptable. It also wouldn't fill in gaps when inodes are deleted if you don't re-search from the parent. ISTR that the constant create/delete didn't cause a problem, will need to remind myself why ... -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html