Am Mi., 10. Juli 2019 um 12:31 Uhr schrieb Chao Yu <yuchao0@xxxxxxxxxx>: > Hi Andreas, > > Thanks for your review in advance. :) > > On 2019/7/10 7:32, Andreas Grünbacher wrote: > > Hi Chao, > > > > Am Mi., 3. Juli 2019 um 09:55 Uhr schrieb Chao Yu <yuchao0@xxxxxxxxxx>: > >> Some filesystems like erofs/reiserfs have the ability to pack tail > >> data into metadata, e.g.: > >> IOMAP_MAPPED [0, 8192] > >> IOMAP_INLINE [8192, 8200] > >> > >> However current IOMAP_INLINE type has assumption that: > >> - inline data should be locating at page #0. > >> - inline size should equal to .i_size > >> Those restriction fail to convert to use iomap IOMAP_INLINE in erofs, > >> so this patch tries to relieve above limits to make IOMAP_INLINE more > >> generic to cover tail-packing case. > > > > this patch suffers from the following problems: > > > > * Fiemap distinguishes between FIEMAP_EXTENT_DATA_INLINE and > > FIEMAP_EXTENT_DATA_TAIL. Someone will sooner or later inevitably use > > iomap_fiemap on a filesystem with tail packing, so if we don't make > > the same distinction in iomap, iomap_fiemap will silently produce the > > wrong result. This means that IOMAP_TAIL should become a separate > > mapping type. > > I'm a little confused, IIUC, IOMAP_TAIL and FIEMAP_EXTENT_DATA_TAIL are with > different semantics: > > - IOMAP_TAIL: > we may introduce this flag to cover tail-packing case, in where we merge > small-sized data in the tail of file into free space of its metadata. > - FIEMAP_EXTENT_DATA_TAIL: > quoted from Documentation/filesystems/fiemap.txt > " This will also set FIEMAP_EXTENT_NOT_ALIGNED > Data is packed into a block with data from other files." > It looks like this flag indicates that blocks from different files stores into > one common block. Well, maybe FIEMAP_EXTENT_DATA_INLINE is indeed the more appropriate flag in your scenario. In that case, we should be fine on the fiemap front. > I'm not familiar with fiemap flags' history, but maybe FIEMAP_EXTENT_DATA_TAIL > should be better to rename to FIEMAP_EXTENT_DATA_PACKING to avoid ambiguity. And > then FIEMAP_EXTENT_DATA_INLINE will be enough to cover normal inline case and > tail-packing case? Those definitions are user-visible. > > * IOMAP_INLINE mappings always start at offset 0 and span an entire > > page, so they are always page aligned. IOMAP_TAIL mappings only need > > Why can't #0 page consist of more than one block/mapping? I didn't get what's > difference here. Currently, iomap_write_begin will read the inline data into page #0 and mark that page up-to-date. There's a check for that in iomap_write_end_inline. If a page contains more than one mapping, we won't be able to mark the entire page up to date anymore; we'd need to track which parts of the page are up to date and which are not. Iomap supports two tracking mechanisms, buffer heads and iomap_page, and we'd need to implement that tracking code for each of those cases. > > to be block aligned. If the block size is smaller than the page size, > > - reiserfs tries to find last page's content, if the size of that page's valid > content is smaller than threshold (at least less than one block), reiserfs can > do the packing. > > - erofs' block size is 4kb which is the same as page size. > > > Actually, for IOMAP_TAIL, there is no users who need to map more than one block > into metadata, so I'm not sure that we should support that, or we just need to > let code preparing for that to those potential users? How about architectures with PAGE_SIZE > 4096? I'm assuming that erofs doesn't require block size == PAGE_SIZE? > Thanks, > > > a tail page can consist of more than one mapping. So > > iomap_read_inline_data needs to be changed to only copy a single block > > out of iomap->inline_data, and we need to adjust iomap_write_begin and > > iomap_readpage_actor accordingly. > > > > * Functions iomap_read_inline_data, iomap_write_end_inline, and > > iomap_dio_inline_actor currently all assume that we are operating on > > page 0, and that iomap->inline_data points at the data at offset 0. > > That's no longer the case for IOMAP_TAIL mappings. We need to look > > only at the offset within the page / block there. > > > > * There are some asserts like the following int he code: > > > > BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); > > > > Those should probably be tightened to check for block boundaries > > instead of page boundaries, i.e. something like: > > > > BUG_ON(size > i_blocksize(inode) - > > offset_in_block(iomap->inline_data, inode)); > > > >> Signed-off-by: Chao Yu <yuchao0@xxxxxxxxxx> > >> --- > >> fs/iomap.c | 4 +--- > >> 1 file changed, 1 insertion(+), 3 deletions(-) > >> > >> diff --git a/fs/iomap.c b/fs/iomap.c > >> index 12654c2e78f8..d1c16b692d31 100644 > >> --- a/fs/iomap.c > >> +++ b/fs/iomap.c > >> @@ -264,13 +264,12 @@ static void > >> iomap_read_inline_data(struct inode *inode, struct page *page, > >> struct iomap *iomap) > >> { > >> - size_t size = i_size_read(inode); > >> + size_t size = iomap->length; > > > > Function iomap_read_inline_data fills the entire page here, not only > > the iomap->length part, so this is wrong. But see my comment above > > about changing iomap_read_inline_data to read a single block above as > > well. > > > >> void *addr; > >> > >> if (PageUptodate(page)) > >> return; > >> > >> - BUG_ON(page->index); > >> BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); > >> > >> addr = kmap_atomic(page); > >> @@ -293,7 +292,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, > >> sector_t sector; > >> > >> if (iomap->type == IOMAP_INLINE) { > >> - WARN_ON_ONCE(pos); > >> iomap_read_inline_data(inode, page, iomap); > >> return PAGE_SIZE; > >> } > > > > Those last two changes look right to me. > > > > Thanks, > > Andreas > > . > > At this point, can I ask how important this packing mechanism is to you? I can see a point in implementing inline files, which help because there tends to be a large number of very small files. But for not-so-small files, is saving an extra block really worth the trouble, especially given how cheap storage has become? Thanks, Andreas