Hi, On Sat 12-12-15 00:57:01, Andreas Grünbacher wrote: > 2015-12-09 18:57 GMT+01:00 Jan Kara <jack@xxxxxxx>: > > Hello, > > > > inspired by recent reports [1] of problems with mbcache I had a look into what > > we could to improve it. I found the current code rather overengineered > > (counting with single entry being in several indices, having homegrown > > implementation of rw semaphore, ...). > > > > After some thinking I've decided to just reimplement mbcache instead of > > improving the original code in small steps since the fundamental changes in > > locking and layout would be actually harder to review in small steps than in > > one big chunk and overall the new mbcache is actually pretty simple piece of > > code (~450 lines). > > > > The result of rewrite is smaller code (almost half the original size), smaller > > cache entries (7 longs instead of 13), and better performance (see below > > for details). > > I agree that mbcache has scalability problems worth fixing; you may > also be right about replacing instead of fixing it. > > I would prefer an actual replacement over adding mbcache2 though: the > two existing users will be converted immediately; there is no point in > keeping the old version around. For that, can the current mbcache be > converted to the API of the new one in a separate patch first (alloc + > insert vs. create, get + release/free vs. delete_block)? Well, conversion into the new API isn't that trivial because of the lifetime / locking differences. If people prefer it, I can add a patch that just renames everything from mb2 to mb after the old code is removed. Opinions? > The corner cases that mbcache has problems with are: > > (1) Many files with the same xattrs: Right now, an xattr block can be > shared among at most EXT[24]_XATTR_REFCOUNT_MAX = 2^10 inodes. If 2^20 Do you know why there's this limit BTW? The on-disk format can support upto 2^32 references... > inodes are cached, they will have at least 2^10 xattr blocks, all of > which will end up in the same hash chain. An xattr block should be > removed from the mbcache once it has reached its maximum refcount, but > if I haven't overlooked something, this doesn't happen right now. > Fixing that should be relatively easy. Yeah, that sounds like a good optimization. I'll try that. > (2) Very many files with unique xattrs. We might be able to come up > with a reasonable heuristic or tweaking knob for detecting this case; > if not, we could at least use a resizable hash table to keep the hash > chains reasonably short. So far we limit number of entries in the cache which keeps hash chains short as well. Using resizable hash table and letting the system balance number of cached entries just by shrinker is certainly possible however I'm not sure whether the complexity is really worth it. Regarding detection of unique xattrs: We could certainly detect trashing of mbcache relatively easily. The difficult part if how to detect when to enable it again because the workload can change. I'm thinking about some backoff mechanism like caching only each k-th entry asked to be inserted (starting with k = 1) and doubling k if we don't reach some low-watermark cache hit ratio in some number of cache lookups, reducing k to half if we reach high-watermark cache hit ratio. However again I'm not yet convinced complex schemes like this are worth it for mbcache (although it might be interesting research topic as such) and so I didn't try anything like this for the initial posting. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html