Re: squashfs performance regression and readahea

Hsin-Yi Wang <hsinyi@xxxxxxxxxxxx> · Mon, 16 May 2022 17:10:09 +0800

On Mon, May 16, 2022 at 5:00 PM Xiongwei Song <sxwjean@xxxxxxxxx> wrote:
>
> You can not just sign your signature. You ignored others' contributions.
> This is unacceptable.
>

Hi,

Don't be angry. My thought is, since this is still under review and
I'm not sure if the performance issue is settled, it's more important
to make sure it's ready.

If it's ready, I'll send it to the list formally, so it's easier for
maintainers (Matthew) to pick. At that time, I'll add your Tested-by
(again, I'm not sure the performance now is okay for you or not, so I
didn't add your tag. It would be incorrect to add your tag if the
performance is still not desired) and Phillips's Reviewed-by (Or
Signed-off-by) (I'm also not sure if Phillip or Matthew have other
comments, so I can't add their signature now). Maintainers will
probably add their Signed-off-by when they pick the patch.

I'm sorry that if not adding the tags in this WIP patch now offended you.

> On Mon, May 16, 2022 at 4:23 PM Hsin-Yi Wang <hsinyi@xxxxxxxxxxxx> wrote:
> >
> > On Sun, May 15, 2022 at 8:55 AM Phillip Lougher <phillip@xxxxxxxxxxxxxxx> wrote:
> > >
> > > On 13/05/2022 07:35, Hsin-Yi Wang wrote:
> > > > On Fri, May 13, 2022 at 1:33 PM Phillip Lougher <phillip@xxxxxxxxxxxxxxx> wrote:
> > > >>
> > > >> My understanding is that this call will fully populate the
> > > >> pages array with page references without any holes.  That
> > > >> is none of the pages array entries will be NULL, meaning
> > > >> there isn't a page for that entry.  In other words, if the
> > > >> pages array has 32 pages, each of the 32 entries will
> > > >> reference a page.
> > > >>
> > > > I noticed that if nr_pages < max_pages, calling read_blocklist() will
> > > > have SQUASHFS errors,
> > > >
> > > > SQUASHFS error: Failed to read block 0x125ef7d: -5
> > > > SQUASHFS error: zlib decompression failed, data probably corrupt
> > > >
> > > > so I did a check if nr_pages < max_pages before squashfs_read_data(),
> > > > just skip the remaining pages and let them be handled by readpage.
> > > >
> > >
> > > Yes that avoids passing the decompressor code a too small page range.
> > > As such extending the decompressor code isn't necessary.
> > >
> > > Testing your patch I discovered a number of cases where
> > > the decompressor still failed as above.
> > >
> > > This I traced to "sparse blocks", these are zero filled blocks, and
> > > are indicated/stored as a block length of 0 (bsize == 0).  Skipping
> > > this sparse block and letting it be handled by readpage fixes this
> > > issue.
> > >
> > Ack. Thanks for testing this.
> >
> > > I also noticed a potential performance improvement.  You check for
> > > "pages[nr_pages - 1]->index >> shift) == index" after calling
> > > squashfs_read_data.  But this information is known before
> > > calling squashfs_read_data and moving the check to before
> > > squashfs_read_data saves the cost of doing a redundant block
> > > decompression.
> > >
> > After applying this, The performance becomes:
> > 2.73s
> > 2.76s
> > 2.73s
> >
> > Original:
> > 2.76s
> > 2.79s
> > 2.77s
> >
> > (The pack file is different from my previous testing in this email thread.)
> >
> > > Finally I noticed that if nr_pages grows after the __readahead_batch
> > > call, then the pages array and the page actor will be too small, and
> > > it will cause the decompressor to fail.  Changing the allocation to
> > > max_pages fixes this.
> > >
> > Ack.
> >
> > I've added the fixes patch and previous fixes.
> > > I have rolled these fixes into the patch below (also attached in
> > > case it gets garbled).
> > >
> > > diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > > index 7cd57e0d88de..14485a7af5cf 100644
> > > --- a/fs/squashfs/file.c
> > > +++ b/fs/squashfs/file.c
> > > @@ -518,13 +518,11 @@ static void squashfs_readahead(struct
> > > readahead_control *ractl)
> > >             file_end == 0)
> > >                 return;
> > >
> > > -       nr_pages = min(readahead_count(ractl), max_pages);
> > > -
> > > -       pages = kmalloc_array(nr_pages, sizeof(void *), GFP_KERNEL);
> > > +       pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
> > >         if (!pages)
> > >                 return;
> > >
> > > -       actor = squashfs_page_actor_init_special(pages, nr_pages, 0);
> > > +       actor = squashfs_page_actor_init_special(pages, max_pages, 0);
> > >         if (!actor)
> > >                 goto out;
> > >
> > > @@ -538,11 +536,18 @@ static void squashfs_readahead(struct
> > > readahead_control *ractl)
> > >                         goto skip_pages;
> > >
> > >                 index = pages[0]->index >> shift;
> > > +
> > > +               if ((pages[nr_pages - 1]->index >> shift) != index)
> > > +                       goto skip_pages;
> > > +
> > >                 bsize = read_blocklist(inode, index, &block);
> > > +               if (bsize == 0)
> > > +                       goto skip_pages;
> > > +
> > >                 res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
> > >                                          actor);
> > >
> > > -               if (res >= 0 && (pages[nr_pages - 1]->index >> shift) == index)
> > > +               if (res >= 0)
> > >                         for (i = 0; i < nr_pages; i++)
> > >                                 SetPageUptodate(pages[i]);
> > >
> > > --
> > > 2.34.1
> > >
> > >
> > >
> > > Phillip
> > >
> > >
> > > >> This is important for the decompression code, because it
> > > >> expects each pages array entry to reference a page, which
> > > >> can be kmapped to an address.  If an entry in the pages
> > > >> array is NULL, this will break.
> > > >>
> > > >> If the pages array can have holes (NULL pointers), I have
> > > >> written an update patch which allows the decompression code
> > > >> to handle these NULL pointers.
> > > >>
> > > >> If the pages array can have NULL pointers, I can send you
> > > >> the patch which will deal with this.
> > > >
> > > > Sure, if there are better ways to deal with this.
> > > >
> > > > Thanks.
> > > >
> > > >>
> > > >> Thanks
> > > >>
> > > >> Phillip
> > > >>
> > > >>
> > > >>
> > > >>>
> > > >>>>>
> > > >>>>> It's also noticed that when the crash happened, nr_pages obtained by
> > > >>>>> readahead_count() is 512.
> > > >>>>> nr_pages = readahead_count(ractl); // this line
> > > >>>>>
> > > >>>>> 2) Normal cases that won't crash:
> > > >>>>> [   22.651750] Block @ 0xb3bbca6, compressed size 42172, src size 262144
> > > >>>>> [   22.653580] Block @ 0xb3c6162, compressed size 29815, src size 262144
> > > >>>>> [   22.656692] Block @ 0xb4a293f, compressed size 17484, src size 131072
> > > >>>>> [   22.666099] Block @ 0xb593881, compressed size 39742, src size 262144
> > > >>>>> [   22.668699] Block @ 0xb59d3bf, compressed size 37841, src size 262144
> > > >>>>> [   22.695739] Block @ 0x13698673, compressed size 65907, src size 131072
> > > >>>>> [   22.698619] Block @ 0x136a87e6, compressed size 3155, src size 131072
> > > >>>>> [   22.703400] Block @ 0xb1babe8, compressed size 99391, src size 131072
> > > >>>>> [   22.706288] Block @ 0x1514abc6, compressed size 4627, src size 131072
> > > >>>>>
> > > >>>>> nr_pages are observed to be 32, 64, 256... These won't cause a crash.
> > > >>>>> Other values (max_pages, bsize, block...) looks normal
> > > >>>>>
> > > >>>>> I'm not sure why the crash happened, but I tried to modify the mask
> > > >>>>> for a bit. After modifying the mask value to below, the crash is gone
> > > >>>>> (nr_pages are <=256).
> > > >>>>> Based on my testing on a 300K pack file, there's no performance change.
> > > >>>>>
> > > >>>>> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > > >>>>> index 20ec48cf97c5..f6d9b6f88ed9 100644
> > > >>>>> --- a/fs/squashfs/file.c
> > > >>>>> +++ b/fs/squashfs/file.c
> > > >>>>> @@ -499,8 +499,8 @@ static void squashfs_readahead(struct
> > > >>>>> readahead_control *ractl)
> > > >>>>>     {
> > > >>>>>            struct inode *inode = ractl->mapping->host;
> > > >>>>>            struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> > > >>>>> -       size_t mask = (1UL << msblk->block_log) - 1;
> > > >>>>>            size_t shift = msblk->block_log - PAGE_SHIFT;
> > > >>>>> +       size_t mask = (1UL << shift) - 1;
> > > >>>>>
> > > >>>>>
> > > >>>>> Any pointers are appreciated. Thanks!
> > > >>>>
> > > >>