Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am Mi., 10. Juli 2019 um 12:31 Uhr schrieb Chao Yu <yuchao0@xxxxxxxxxx>:
> Hi Andreas,
>
> Thanks for your review in advance. :)
>
> On 2019/7/10 7:32, Andreas Grünbacher wrote:
> > Hi Chao,
> >
> > Am Mi., 3. Juli 2019 um 09:55 Uhr schrieb Chao Yu <yuchao0@xxxxxxxxxx>:
> >> Some filesystems like erofs/reiserfs have the ability to pack tail
> >> data into metadata, e.g.:
> >> IOMAP_MAPPED [0, 8192]
> >> IOMAP_INLINE [8192, 8200]
> >>
> >> However current IOMAP_INLINE type has assumption that:
> >> - inline data should be locating at page #0.
> >> - inline size should equal to .i_size
> >> Those restriction fail to convert to use iomap IOMAP_INLINE in erofs,
> >> so this patch tries to relieve above limits to make IOMAP_INLINE more
> >> generic to cover tail-packing case.
> >
> > this patch suffers from the following problems:
> >
> > * Fiemap distinguishes between FIEMAP_EXTENT_DATA_INLINE and
> > FIEMAP_EXTENT_DATA_TAIL. Someone will sooner or later inevitably use
> > iomap_fiemap on a filesystem with tail packing, so if we don't make
> > the same distinction in iomap, iomap_fiemap will silently produce the
> > wrong result. This means that IOMAP_TAIL should become a separate
> > mapping type.
>
> I'm a little confused, IIUC, IOMAP_TAIL and FIEMAP_EXTENT_DATA_TAIL are with
> different semantics:
>
> - IOMAP_TAIL:
>   we may introduce this flag to cover tail-packing case, in where we merge
> small-sized data in the tail of file into free space of its metadata.
> - FIEMAP_EXTENT_DATA_TAIL:
> quoted from Documentation/filesystems/fiemap.txt
> "  This will also set FIEMAP_EXTENT_NOT_ALIGNED
> Data is packed into a block with data from other files."
> It looks like this flag indicates that blocks from different files stores into
> one common block.

Well, maybe FIEMAP_EXTENT_DATA_INLINE is indeed the more appropriate
flag in your scenario. In that case, we should be fine on the fiemap
front.

> I'm not familiar with fiemap flags' history, but maybe FIEMAP_EXTENT_DATA_TAIL
> should be better to rename to FIEMAP_EXTENT_DATA_PACKING to avoid ambiguity. And
> then FIEMAP_EXTENT_DATA_INLINE will be enough to cover normal inline case and
> tail-packing case?

Those definitions are user-visible.

> > * IOMAP_INLINE mappings always start at offset 0 and span an entire
> > page, so they are always page aligned. IOMAP_TAIL mappings only need
>
> Why can't #0 page consist of more than one block/mapping? I didn't get what's
> difference here.

Currently, iomap_write_begin will read the inline data into page #0
and mark that page up-to-date. There's a check for that in
iomap_write_end_inline. If a page contains more than one mapping, we
won't be able to mark the entire page up to date anymore; we'd need to
track which parts of the page are up to date and which are not. Iomap
supports two tracking mechanisms, buffer heads and iomap_page, and
we'd need to implement that tracking code for each of those cases.

> > to be block aligned. If the block size is smaller than the page size,
>
> - reiserfs tries to find last page's content, if the size of that page's valid
> content is smaller than threshold (at least less than one block), reiserfs can
> do the packing.
>
> - erofs' block size is 4kb which is the same as page size.
>
>
> Actually, for IOMAP_TAIL, there is no users who need to map more than one block
> into metadata, so I'm not sure that we should support that, or we just need to
> let code preparing for that to those potential users?

How about architectures with PAGE_SIZE > 4096? I'm assuming that erofs
doesn't require block size == PAGE_SIZE?

> Thanks,
>
> > a tail page can consist of more than one mapping. So
> > iomap_read_inline_data needs to be changed to only copy a single block
> > out of iomap->inline_data, and we need to adjust iomap_write_begin and
> > iomap_readpage_actor accordingly.
> >
> > * Functions iomap_read_inline_data, iomap_write_end_inline, and
> > iomap_dio_inline_actor currently all assume that we are operating on
> > page 0, and that iomap->inline_data points at the data at offset 0.
> > That's no longer the case for IOMAP_TAIL mappings. We need to look
> > only at the offset within the page / block there.
> >
> > * There are some asserts like the following int he code:
> >
> >   BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
> >
> > Those should probably be tightened to check for block boundaries
> > instead of page boundaries, i.e. something like:
> >
> >   BUG_ON(size > i_blocksize(inode) -
> > offset_in_block(iomap->inline_data, inode));
> >
> >> Signed-off-by: Chao Yu <yuchao0@xxxxxxxxxx>
> >> ---
> >>  fs/iomap.c | 4 +---
> >>  1 file changed, 1 insertion(+), 3 deletions(-)
> >>
> >> diff --git a/fs/iomap.c b/fs/iomap.c
> >> index 12654c2e78f8..d1c16b692d31 100644
> >> --- a/fs/iomap.c
> >> +++ b/fs/iomap.c
> >> @@ -264,13 +264,12 @@ static void
> >>  iomap_read_inline_data(struct inode *inode, struct page *page,
> >>                 struct iomap *iomap)
> >>  {
> >> -       size_t size = i_size_read(inode);
> >> +       size_t size = iomap->length;
> >
> > Function iomap_read_inline_data fills the entire page here, not only
> > the iomap->length part, so this is wrong. But see my comment above
> > about changing iomap_read_inline_data to read a single block above as
> > well.
> >
> >>         void *addr;
> >>
> >>         if (PageUptodate(page))
> >>                 return;
> >>
> >> -       BUG_ON(page->index);
> >>         BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
> >>
> >>         addr = kmap_atomic(page);
> >> @@ -293,7 +292,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >>         sector_t sector;
> >>
> >>         if (iomap->type == IOMAP_INLINE) {
> >> -               WARN_ON_ONCE(pos);
> >>                 iomap_read_inline_data(inode, page, iomap);
> >>                 return PAGE_SIZE;
> >>         }
> >
> > Those last two changes look right to me.
> >
> > Thanks,
> > Andreas
> > .
> >

At this point, can I ask how important this packing mechanism is to
you? I can see a point in implementing inline files, which help
because there tends to be a large number of very small files. But for
not-so-small files, is saving an extra block really worth the trouble,
especially given how cheap storage has become?

Thanks,
Andreas




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux