On Wed, May 27, 2015 at 07:38:00PM -0400, Theodore Ts'o wrote: > On Wed, May 27, 2015 at 11:53:17AM -0700, Tom Marshall wrote: > > But one thing I'm wrestling with is how to be asynchronously notified when > > the lower readpage/readpages complete. The two ideas that come to mind are > > (1) plumbing a callback into mpage_end_io(), (2) allowing override of > > mpage_end_io() with a custom function, (3) creating kernel threads analogous > > to kblockd to wait for pending pages. > > Not all file systems use mpage_end_io(), so that's not a good > solution. Ah, thanks, I was not aware of that. So that leaves waiting on pages, which probably means a fair amount of plumbing to do correctly. > You can do something like > > wait_on_page_bit(page, PG_uptodate); > > ... although to be robust you will also need to wake up if PG_error is > set (if there is an I/O error, PG_error is set instead of > PG_uptodate). So that means you'd have to spin your own wait function > using the waitqueue primitives and page_waitqueue(), using > __wait_on_bit() as an initial model. Right, that should be pretty easy. > This suggestion should not be taken as an endorsement of your > higher-levle architecture. I suggest you think very carefully about > whether or not you need to be able to support random write > functionality, and if you don't, there are simpler ways such as the > one I outlined to you earlier. I recall this: > [...] So it's better to have the file system supply the physical location on > disk, and then to read in the compressed data to a scratched set of page > which is freed immediately after you are done decompressing things. Is that what you're referring to? If so, I'm not seeing how this makes things simpler. It's still asynchronous, right? ext4_readpage calls back into mpage_readpage which uses ext4_get_block, which then queues a bio request. I don't see any way to avoid queueing asynchronous bio requests or even getting completion notifications. Now, I could pass ext4_get_block up into my code and setup my own bio requests so that I can get callbacks. But this basically means implementing the equivalent of do_mpage_readpage in my own code, and that's not really trivial code to copy/paste/hack. And it also doesn't address filesystems that don't use mpage_end_io(), right? Am I missing something? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html