David Masover <ninja@xxxxxxxxxxxx> wrote: > Does the cache call sync/fsync overly often? Not at all. > If not, we can gain something by using an underlying FS with lazy writes. Yes, to some extent. There's still the problem of filesystem integrity to deal with, and lazy writes hold up journal closure. This isn't necessarily a problem, except when you want to delete and launder a block that has a write hanging over it. It's not unsolvable, just tricky. Besides, what do you mean by lazy? Also consider: you probably want to start netfs data writes as soon as possible as not having cached the page yet restricts the netfs's activities on that page; but you want to defer metadata writes as long as possible because they may become obsolete, it may be possible to batch them and it may be possible to merge them. > I think the caching should be done asynchronously. As stuff comes in, > it should be handed off both to the app requesting it and to a queue to > write it to the cache. If the queue gets too full, start dropping stuff > from it the same way you do from cache -- probably LRU or LFU or > something similar. That's not a bad idea; we need a rate limit on throwing stuff at the cache in the situation where there's not much disk space available. Actually, probably the biggest bottleneck is the disk block allocator. Given that I'm using lists of free blocks, it's difficult to place a tentative reservation on a block, and it very much favours allocating blocks for one transaction at a time. However, free lists make block recycling a lot easier. I could use a bitmap instead; but that requires every block allocated or deleted be listed in the journal. Not only that but it complicates deletion and journal replay. Also, under worst case conditions it's really nasty because you could end up with a situation where you've got one a whole set of bitmaps, each with one free block; that means you've got to read a whole lot of bitmaps to allocate the blocks you require, and you have to modify several of them to seal an allocation. Furthermore, you end up losing a chunk of space statically allocated to the maintenance of these things, unless you want to allocate the bitmaps dynamically also... > Another question -- how much performance do we lose by caching, assuming > that both the network/server and the local disk are infinitely fast? > That is, how many cycles do we lose vs. local disk access? Basically, > I'm looking for something that does what InterMezzo was supposed to -- > make cache access almost as fast as local access, so that I can replace > all local stuff with a cache. Well, with infinitely fast disk and network, very little - you can afford to be profligate on your turnover of disk space, and this affects the options you might choose in designing your cache. The real-world case is more interesting as you have to compromise. With CacheFS as it stands, it attempts not to lose any data blocks, and it attempts not to return uninitialised data, and these two constraints work counter to each other. There's a second journal (the validity journal) to record blocks that have been allocated but that don't yet have data stored therein. This permits advance allocation, but requires a second update journal entry to clear the validity journal entry after the data has been stored. It also requires the validity journal to be replayed upon mounting. Reading one really big file (bigger than the memory available) over AFS, with a cold cache it took very roughly 107% of the time it took with no cache; but using a warm cache, it took 14% of the time it took with no cache. However, this is on my particular test box, and it varies a lot from box to box. This doesn't really demonstrate the latency on indexing, however; that we have to do before we even consider touching the network. I don't have numbers on that, but in the worst case they're going to be quite bad. I'm currently working on mark II CacheFS, using a wandering tree to maintain the index. I'm not entirely sure whether I want to include the data pointers in this tree. There are advantages to doing so: namely that I can use the same tree maintenance routines for everything, but also disadvantages: namely that it complicates deletion a lot. Using a wandering tree will cut the latency on index lookups (because it's a tree), and simplify journalling (wandering) and mean I can just grab a block, write to it and then connect it (wandering). Block allocation is still unpleasant though... David