On Wednesday, May 30, 2018 9:32:25 PM IST Theodore Y. Ts'o wrote: > On Wed, May 30, 2018 at 05:03:40PM +0530, Chandan Rajendra wrote: > > On Wednesday, May 30, 2018 10:36:42 AM IST Theodore Y. Ts'o wrote: > > > Note: we may want to move this thread so that it's on linux-fscrypt > > > exclusively. I'm thinking that we should consider using linux-fscrypt > > > for fscrypt and fsverity discussions. That way we can avoid adding > > > extra noise to the linux-fsdevel and linux-ext4 lists. Comments? > > > > I agree. > > OK, I've moved linux-ext4 and linux-fsdevel to bcc. > > > > I now think this was a mistake, and that we should handle this the > > > same way we handle CONFIG_QUOTA. If we enable fscrypt or fsverity, it > > > should be enabled for all file sytems which support that feature. > > > > Do we continue to retain ext4/readpage.c? I ask because if the logic in > > ext4/readpage.c has to be moved to fs/buffer.c (to avoid code duplication) then > > we end up losing the ability to build fscrypt as a module even if one of > > the filesystems using fs/bio_post_read.c is itself built as a module. > > It's not necessarily an either-or. For example, we could use function > pointers to allow avoid needing code in the built-in portion of the > kernel from linking directly to the module. Just as an illustration, > we could hang a pointer off of the struct super data structure which > is an array of post-processing functions, so each file system can > register post-processing handlers for each "post processing step". > > I think the bigger deal is that mpage_readpage() and mpage_readpages() > are used by a large number of file systems, with a variety of > requirements --- including some with high performance requirements. > So if we make changes to fs/mpage.c, we need to make sure we don't > make things worse for them. In particular, if post-processing isn't > needed, we shouldn't be doing any extra memory allocations or taking > any extra locks. I agree. Lets get things implemented in ext4/readpage.c first. If we are then sure that the corresponding changes in fs/mpage.c won't harm performance/memory consumption, we could move over the code to fs/mpage.c. > > The somewhat smaller concern is that because mpage_readpages() has to > be used for some very old file systems, it uses a very simple > interface for getting the mapping from a logical block # to the > physical block number. One of the benefits when I created > fs/ext4/readpage.c was that I dind't need to restrict myself to using > the old get_blocks_t interface, but could call ext4_map_blocks() > directly. So for a readpages() call, we don't need end up needing to > call ext4_get_block() for each block mapping. This saves CPU and the > need to take a read spinlock multiple times. On the other hand, > readpages() is only used for for preallocation reads (so it's not > _quite_ as performance critical) and as an indication, XFS, which > tends to be highly performance sensitive, is using mpage_readpages() > without worrying too much about this issue. Sorry, This is not related to the "bio post processing" topic. do_mpage_readpages() has the following, if (block_in_file < last_block) { map_bh->b_size = (last_block-block_in_file) << blkbits; if (get_block(inode, block_in_file, map_bh, 0)) goto confused; *first_logical_block = block_in_file; } So ext4_get_block() would have been invoked for mapping over a range that is larger than a block. do_mpage_readpages() would continue to use this mapping without invoking get_block() callback until it has exhausted all the mapped blocks. > > The bottom line is that there is a large set of engineering tradeoffsh > to be made here, which is why we should talk about the approach first. > It may be there are people who can emit large amounts of perfectly > designed code all at once at high speed (perhaps like Mozart, who once > compared is musical composition process to vomiting, to the point > where he was bottlenecked on the speed that he could write notes on > staff paper), but I'm not smart enough to do that. For code this > complex, doing some design ahead of time is a good thing. :-) > > > Relying on bio->bi_private to check for requirement of post processing steps > > does not seem to be correct. AFAIK (please correct me if I am wrong) the > > convention is that bi_private field is owned by the code that allocates the > > bio and the owner can assign its own value to this field. > > Well, there is precedence for common code creating a convention for > using a private field. For example, page_buffers() uses > page->private. If a page belongs to a user such as a file system, > it's up the owner to decide how to use the private field, that's true. > However, if it uses fs/mpage.c, fs/mpage.c does use page_buffers(), so > a file system which uses mpage_readpages() has to allow page->private > to be used according to the page_buffers convention. > > If we try to use bio->bi_private for fscrypt and fsveirty, *and* we > want to fold this support into common code like fs/mpage.c --- we're > probably OK, since at the moment, the fs/mpage.c code: > > * creates the bio > * submits the bio > * handles the bio completion handling > > There is no way callers of mpage_readpage() and mpage_readpages() can > override how the bio is created (and hence control setting bi_private) > nor can they provide a bio completion handler (which could consume the > bi_private) field. I am diverging a bit here (since it was decided that we will get the "bio post processing" working for blocksize == pagesize first). W.r.t block_read_full_page(), we will have a 'struct buffer_head *' assigned to bio->bi_private and hence the condition to check for if "bio post processing" is required, will incorrectly return true value for such bios. > > > Ideally we should be checking for per-inode encryption/verity flags to decide > > if any post-processing steps are required to be executed > > At least for fsverity, it can't be a per-inode flag, since whether or > not verity processing is enabled depends on what part of the inode you > are reading. (For example, if you are reading the Merkle tree or the > verity footer block, that's file metadata which does need verity > processing.) > > We could make that be handled in the post-processing function, which > either returns an ERR_PTR, 0 if whatever work it did was done > immediately and nothing had to be queued on a workqueue, so the > bio-post-read code should try figure out what is the next > post-processing function should be called, and then call it, or 1 if > the work was queued on a workqueue, and so bio-post-read processor > should exit, since the bio-post-read processing function will be > called once the workqueue function is done doing its thing. > > I'm not saying that's how we must do things; this is all part of the > design brainstorming process. It's a lot easier to throw out possible > ideas for discussion than to spend days or weeks coding, only to > discover that it's not the best way forward. :-) > -- chandan -- To unsubscribe from this list: send the line "unsubscribe linux-fscrypt" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html