On Tue, May 07, 2019 at 01:56:27PM -0500, Probir Roy wrote: > > block number (e.g., the location on disk). It's a cache; lookups are > > fast, and is an in-memory lookup. Well, it's a little more than a > > cache, it also stores some information for delayed allocation buffered > > writes. > > For sequential access, it will traverse almost the same path of the > tree. How deep the extent status tree be in general? If the tree is > much deeper, the sequential accesses would have many repeated nodes > traversal on the tree for the lookup. Have you observed significant > bottleneck on "ext4_es_lookup_extent"? Can it be removed by caching > the parent node? You do realize that the extent status tree is separate from the on-disk ext4 extent tree, right? The extent status tree is an in-memory cache, and it caches logical extents. Which is to say, the on-disk physical extents are limited (for historical reasons) to 32,767 blocks in an on-disk extent entry. If you have a contiguous range of 128,000 blocks, it will require 4 on-disk extents in the ext4 extent tree. The extent status tree, being an in-memory data structure, will collapse those 4 on-disk extents (assuming they are physically and logically contiguous) into a single in-memory entry in the extent status tree. So the depth of the extent status tree very much depends on how fragmented the data blocks are for the file in question. If the file is 100% contiguous, and pre-allocated in advance, it could be 12TB long, and it would only take a single entry in the extent status tree. If the file is very highly fragmented, then of course, the size of the extent status tree in memory and the on-disk extent tree can be quite large. It's true that if the tree is very deep, then you might have to do many traversals of the red-black tree. But that's because file is super fragmented. If we didn't have the extent status cache, then we'd have to read in portions of the on-disk extent tree. That tree has a larger fanout, so it's wider, as well as being less deep. But if you are talking about the number of memory accesses needed to traverse the extent tree, it's going to be roughly the same as the read-block tree. In either case, it's going to be O(log N) of the number of extents in the highly fragmented file. So let's back up. Why are you so concerned about potential bottle-necks on the ext4_es_lookup_extent()? What is your workload? How badly fragmented is your file? And have you considered taking measures (such as preallocating the file using fallocate, and possibly pre-initializing the file) to improve things? If you are doing something nasty such as a random write workload using buffered writes, without preallocating the file, then yeah, things can get pretty nasty. But the problem isn't going to be in the extent status tree; it's going to be in many positions as well. - Ted