On Friday May 12, akpm@xxxxxxxx wrote: > NeilBrown <neilb@xxxxxxx> wrote: > > > > ./drivers/md/bitmap.c | 115 ++---------------------------------------- > > hmm. I hope we're not doing any of that filesystem I/O within the context > of submit_bio() or kblockd or anything like that. Looks OK from a quick > scan. No. We do all the I/O from the context of the per-array thread. However some IO requests cannot complete until the filesystem I/O completes, so we need to be sure that the filesystem I/O won't block waiting for memory, or fail with -ENOMEM. > > a_ops->commit_write() already ran set_page_dirty(), so you don't need that > in there. Is that documented somewhere..... but yes, that seems to be right. Thanks. > > I assume this always works in units of a complete page? It's strange to do > prepare_write() followed immediately by commit_write(). Normally > prepare_write() will do some prereading, but it's smart enough to not do > that if the caller is preparing to write the whole page. > Yes, it is strange. That was one of the things that made me want to review this code and figure out how to do it "properly". As far as I can see, much of 'address_space' is really an internal interface to support routines used by the filesystem. A filesystem may choose to use address spaces, and has a fair degree of freedom when it comes to which bits to make use of and exactly what they mean. About the only thing that *has* to be supported is ->writepages -- which has a fair degree of latitude in exactly what it does -- and ->writepage -- which can only be called after locking a page and rechecking the ->mapping. bitmap.c is currently trying to do something every different. It uses ->readpage to get pages in the page cache (even though some address spaces don't define ->readpage) and then holds onto those pages without holding the page lock, and then calls ->writepage to flush them out from time to time. Before calling writepage it gets the pagelock, but doesn't re-check that ->mapping is correct (there is nothing much it can do if it isn't correct..). I noticed this is particularly a problem with tmpfs. When you call writepage on a tmpfs page, the page is swizzled into the swap cache, and ->mapping becomes NULL - not the behaviour that bitmap is expecting. Now I agree that tmpfs is an unusual case, and that storing a bitmap in tmpfs doesn't make a lot of sense (though it can make some...) but the point is that if a filesystem is allowed to move pages around like that, then bitmap cannot hold on to pages in the page cache like it wants to. It simply isn't a well defined thing to do. > We normally use PAGE_CACHE_SIZE for these things, not PAGE_SIZE. Same diff. > Yeah.... why is that? Why have two names for exactly the same value? How does a poor develop know when to use one and when the other? More particularly, how does one remember. I would argue that the "same diff" should be no difference - not even in spelling.... > If you have a page and you want to write the whole thing out then there's > really no need to run prepare_write or commit_write at all. Just > initialise the whole page, run set_page_dirty() then write_one_page(). > I see that now. But only after locking the page, and rechecking that ->mapping is correct, and if it isn't .... well, more work is involved that bitmap is in a position to do. > Perhaps it should check that the backing filesystem actually implements > commit_write(), prepare_write(), readpage(), etc. Some might not, and the > user will get taught not to do that via an oops. > Might help, but as I think you've gathered, I really want a whole different approach to writing to the file. One that I can justify as being a "correct" use of interfaces, and also that I can be certain will not block or fail on a kmalloc or similar. Hence the bmap thing later. Thanks for your feedback. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html