Thanks for the thorough reply, David... Responses are inserted below, but I think the executive summary is this: In the short term I think I'm stuck with EL5 kernels due to voluminous Lustre dependencies (I'm separatly looking into how to get past this). That begs the question as to how I can get the latest possible fscache code into an EL5 kernel. I'm currently running a 2.6.18-53.1.14-el5 kernel, and the fscache code is quite old, I think. More detail below... Thanks! John On Tue, Nov 11, 2008 at 5:33 PM, David Howells <dhowells@xxxxxxxxxx> wrote: > John Groves <John@xxxxxxxxxx> wrote: > > > I have modified the lustre client filesystem to use fscache, and it is in > a > > rudimentary working state. > > Excellent! > > > Among my most pressing requirements is to purge the fscache for any > extent > > for which a DLM lock is revoked. > > Hmmm... Do you have an open cookie on an FS-Cache object at the time this > lock > is revoked? Or are you looking for a shortcut - the equivalent of a delete > op > - by which you can supply a key and say 'delete that if it's there'? > > Note that I cannot provide you with functionality to punch holes in files > in > the cache very easily, not until the filesystems available to CacheFiles > get > that capability. Hole punching would be ideal, but I understand the limitation. Yes, I have a cookie. Currently, if a DLM lock is revoked, I just blow away the whole file in the fscache -- at least that's what I think I'm doing. I call a function derived from nfs_fscache_disable_cookie(), which appears to clean up the page cache and then call fscache_relinquish_cookie(). Actually, I'm not sure I should have kept the page cache cleanup part from the nfs' "fscache.[ch]" since lustre does its own page cache cleanup (and it's conceivable that a DLM lock extent is not a whole file, although my current fscache approach is to blow away the whole file in fscache). > To the end of proving that functionality, I would like to give myself a > file > > ioctl that would determine what is in the fscache for a given file. > Since > > this is for testing, performance isn't a major concern. I'm already > doing > > this with the page cache, and I hope something similar would be possible > > with the fscache. > > > So, you want to be able to get, say, a bitmap of all the pages resident in > the > disk cache for a particular cookie - mass bmap() if you will? A bit map would be cool. An extent list would be OK too. Or just an ability to ask fscache whether a given page (or extent) is in the disk cache... > > Is there a supported way to query whether a given page_index is in the > > fscache? If not, I'd appreciate suggestions as to how to go about this > (or > > insight into how other implementers have proven functionality without > this > > feature). I'm fairly ignorant as to the internals of fscache... > > > Currently, the only way to do this is to try reading it, and observe the > error > code. It's not a requirement I've come across to date. > > What exactly is it that you want this functionality for? Just debugging > (proving) that what you ask to be cached actually winds up in the cache? I'm more concerned about the converse: proving that what should have been removed from the fscache has been duly removed. > > What you ask for shouldn't be too hard to provide - after all, I have to do > the > work anyway in order to determine whether I should return ENODATA or begin > a > read op in CacheFiles. The problem here (I think) is that I don't want to load the page cache in order to check whether a page is in the fscache. And there might be cases in testing where I would want to check without regard to whether the page is in the page cache already. > > > If it's merely for debugging, then there's probably no particular need to > optimise it to be fast. Certainly for my current purposes there isn't a need for optimization. My offhand impression is that lustre users might actually want an ioctl-based utility that will tell them the cache status of a file (both page cache and fscache), but even then it's not entirely clear to me that performance of this code path is important. > > > John Groves <John@xxxxxxxxxx> also wrote: > > > I'd like to add one more question... when I explicitly clean out the page > > cache, so as to force reads to be satisfied from the fscache, I > frequently > > find that not all of my pages are available from the fscache. > > Hmmm... That doesn't sound good. What version of fscache and kernel are > you > using? Hmmm indeed. You may have hit on one of my problems here. I'm currently on a 2.6.18-53.1.14.el5 kernel which was chosen because Lustre likes it. We noted early on that there was a big difference between the fscache code here and in "current" kernels, and that grafting the latest fscache code into the 2.6.18 tree didn't look trivial...is there a way to get a "modern" fscache patched into a more or less EL5 kernel? Getting Lustre substantially beyond EL5 may be a non-starter in the short term (though I'll check with the Lustre community). I did some more experiments, and the missing pages seem not to occur if I take a lunch break after reading them into the page cache (and writing to fscache), and then blow away the page cache after lunch. If I just wait a minute or two, the pages may still not make it into the fscache (and running "sync" does not help). For production use, this may not be a showstopper, but for performance testing (to justify the effort) it may cause a penalty. Note that I'm mostly doing tests with very few pages at the moment. It may be less of an issue when fscache/cachefiles' dirty list is much bigger, which will of course be the case in meaningful performance tests. The best compromise for the moment is likely to get the latest kernel/fscache that lustre will work with... > > Have you checked the statistics that are put in /proc/fs/fscache/stats to > see > if they give you some clue? Doh...my kernel doesn't even have a /proc/fs/fscache. I'm pretty far downlevel, I guess. Do you know what version of fscache the /proc entry appeared in? > I don't know why this is, but I suspect that calling my releasepage > method > > (from an ioctl, after loading the cache & fscache) sometimes frees the > > page(s) before fscache gets around to storing them...though that doesn't > make > > sense if fscache bumps the page reference count until it has made a copy > or > > written it out. > > > fscache doesn't keep a ref on the pages directly, though the cache might > (the > cache that writes directly to blockdev certainly does by pasting them into > BIOs). > > What fscache does is to use a couple of page bits on the page to mark its > interest in a netfs page. One (PG_fscache) merely notes that fscache has > an > interest in that page and that fscache_uncache_page() should be called on > it; > the other (PG_fscache_write) indicates that a page is being written to the > cache, and that the caller should wait on it till it gets cleared if they > need > the page. > > Can you show me your releasepage() method? It's attached at the bottom of this message. Should it be looking at the fscache bits? > > (does fscache consider my page dirty for the purpose of writing to > > cachefiles, or does it make a copy, > > > fscache doesn't make a copy of your page, but the cache might. In this > case, > CacheFiles does because I can't work out how to use the AIO interface from > the > kernel. > > As I mentioned above, fscache marks its interest in the page at this point > by > marking it with PG_fscache_write. This means the page may be written to > the > cache at some point. Of course, the cache is always at liberty to refuse > due > to things like ENOSPC, EIO and ENOMEM. If this happens, it _should_ show > up in > /proc/fs/fscache/stats. > > The main purpose of fscache is to insulate as best it can the netfs from > errors > in the cache and to hide at least some of the delays involved. > > > and is it susceptible to having a page freed out from under it? > > In such a case, firstly __free_pages() should bark, and secondly, you're > likely > to get gibberish in the cache, not just missing pages. > > > ...in which case is there a way to perform an explicit flush [preferably > on > > the whole file/object rather than one page at a time]? > > That's something I can look at. The problem with performing an explicit > flush > is that involves flushing stuff that's on the queues to be processed by > other > processes. Part of the problem is that stores are batched to save a > certain > amount of common time when it comes to actually doing the work. I really > should move the batching further down, and, in CacheFiles's case, offer it > to > the underlying fs to do. The BTRFS person is in favour of that. > > David > > - ### Here are the releasepage and removepage methods; not much going on except internal tracking (llap = lustre lite async page I/O tracking stuff): void ll_removepage(struct page *page) { struct ll_async_page *llap = llap_cast_private(page); ENTRY; JGDEBUG(D_FSCACHE, "ll_removepage %p\n", page); LASSERT(!in_interrupt()); /* sync pages or failed read pages can leave pages in the page * cache that don't have our data associated with them anymore */ if (page_private(page) == 0) { EXIT; JGDEBUG(D_FSCACHE, "ll_removepage private err!\n"); return; } LASSERT(!llap->llap_lockless_io_page); LASSERT(!llap->llap_nocache); LL_CDEBUG_PAGE(D_PAGE, page, "being evicted\n"); __ll_put_llap(page); EXIT; } static int ll_releasepage(struct page *page, gfp_t gfp_mask) { JGDEBUG(D_FSCACHE, "ll_releasepage %p\n", page); if (PagePrivate(page)) ll_removepage(page); return 1; } -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cachefs