On Sat, Jul 29, 2017 at 07:37:18PM -0700, Matthew Wilcox wrote: > > The biggest danger I see here is that we're only going to test 32 > groups before falling back to linear probing (we'll shift the single > '1' bit out of 'i' in 32 steps). That might be a performance problem, > but it should hit quite rarely. > > The danger in changing it is that we'll end up with new files created in > a directory choosing a different block group from files created in that > directory using an old kernel. And that could be a worse performance > impact. The original assumption behind this scheme was that by the time we fill a block group --- say block group #42, it was highly likely that the linear probing for block allocations would have filled block groups #43, #44, #45, etc. So by the time block group was filled, when we are allocating a new inode, the idea was to jump much farther away. It's also fair to say that we spent a lot more time thinking about the Orlov algorithm which spreads new directories than we did how find_group_other() would work. Note: Orlov algorithm was originally implemented in the BSD FFS, and was then imported into ext2 by Al Viro. > I think we'd need to see some benchmarks ... Ted, any suggestions for > something which might show a difference between these two approaches > to hashing? Trying to benchmarks different allocation algorithms are hard. The problem is that allocation patterns are very different depending on workloads, and it's not just one workload, but usually, there is a combination of workloads resulting in different levels of free space fragmentation. Not only the workloads or combination of workloads matter, but the amount of free space can also make a huge difference. When I worked on the changes for ext4's allocation algorithm, I used a combination of measuring time to run various benchmarks and looking at free space fragmentation information using e2freefrag and looking at reports of file fragmentation by using "e2fsck -E fragcheck". I did this after inducing various amounts of fragmentation, using various ad hoc means; I should have done more to automate and to make this be more controlled. There is certainly much more that can be done to improve allocation algorithms, for ext2 and ext4. With ext2, though, you give up so much performance from the use of indirect blocks, that it's probably not worth trying to change the inode and block allocation algorithms for ext2. We do use the same "quadratic hash" algorithm in ext4, so it's something that works better is worth some thought. One of the problems is that depending on the number of blocks in the file system, using a pure quadratic probing can cause duplicate probes to the same buckets. Unlike in a hash table, the number block groups isn't necessarily going to be prime, or a pure power of two, in which case we could use something like this: ((i + i*i)/2) % n So if we use a quadratic hash, we might want to artificially limit the number of probes, but at the same time, we might want to make sure we try some buckets that are fairly far away if the first couple of probes don't succeed. But that's kind of what an exponential probing scheme does, which is what we have. So it's not entirely clear trying to use a quadratic hash would actually be a better approach. - Ted