Re: [PATCH v4 3/3] squashfs: implement readahead

Hsin-Yi Wang <hsinyi@xxxxxxxxxxxx> · Mon, 6 Jun 2022 17:55:44 +0800

On Mon, Jun 6, 2022 at 11:54 AM Phillip Lougher <phillip@xxxxxxxxxxxxxxx> wrote:
>
> On 03/06/2022 16:58, Marek Szyprowski wrote:
> > Hi Matthew,
> >
> > On 03.06.2022 17:29, Matthew Wilcox wrote:
> >> On Fri, Jun 03, 2022 at 10:55:01PM +0800, Hsin-Yi Wang wrote:
> >>> On Fri, Jun 3, 2022 at 10:10 PM Marek Szyprowski
> >>> <m.szyprowski@xxxxxxxxxxx> wrote:
> >>>> Hi Matthew,
> >>>>
> >>>> On 03.06.2022 14:59, Matthew Wilcox wrote:
> >>>>> On Fri, Jun 03, 2022 at 02:54:21PM +0200, Marek Szyprowski wrote:
> >>>>>> On 01.06.2022 12:39, Hsin-Yi Wang wrote:
> >>>>>>> Implement readahead callback for squashfs. It will read datablocks
> >>>>>>> which cover pages in readahead request. For a few cases it will
> >>>>>>> not mark page as uptodate, including:
> >>>>>>> - file end is 0.
> >>>>>>> - zero filled blocks.
> >>>>>>> - current batch of pages isn't in the same datablock or not enough in a
> >>>>>>>       datablock.
> >>>>>>> - decompressor error.
> >>>>>>> Otherwise pages will be marked as uptodate. The unhandled pages will be
> >>>>>>> updated by readpage later.
> >>>>>>>
> >>>>>>> Suggested-by: Matthew Wilcox <willy@xxxxxxxxxxxxx>
> >>>>>>> Signed-off-by: Hsin-Yi Wang <hsinyi@xxxxxxxxxxxx>
> >>>>>>> Reported-by: Matthew Wilcox <willy@xxxxxxxxxxxxx>
> >>>>>>> Reported-by: Phillip Lougher <phillip@xxxxxxxxxxxxxxx>
> >>>>>>> Reported-by: Xiongwei Song <Xiongwei.Song@xxxxxxxxxxxxx>
> >>>>>>> ---
> >>>>>> This patch landed recently in linux-next as commit 95f7a26191de
> >>>>>> ("squashfs: implement readahead"). I've noticed that it causes serious
> >>>>>> issues on my test systems (various ARM 32bit and 64bit based boards).
> >>>>>> The easiest way to observe is udev timeout 'waiting for /dev to be fully
> >>>>>> populated' and prolonged booting time. I'm using squashfs for deploying
> >>>>>> kernel modules via initrd. Reverting aeefca9dfae7 & 95f7a26191deon on
> >>>>>> top of the next-20220603 fixes the issue.
> >>>>> How large are these files?  Just a few kilobytes?
> >>>> Yes, they are small, most of them are smaller than 16KB, some about
> >>>> 128KB and a few about 256KB. I've sent a detailed list in private mail.
> >>>>
> >>> Hi Marek,
> >>>
> >>> Are there any obvious squashfs errors in dmesg? Did you enable
> >>> CONFIG_SQUASHFS_FILE_DIRECT or CONFIG_SQUASHFS_FILE_CACHE?
> >> I don't think it's an error problem.  I think it's a short file problem.
> >>
> >> As I understand the current code (and apologies for not keeping up
> >> to date with how the patch is progressing), if the file is less than
> >> msblk->block_size bytes, we'll leave all the pages as !uptodate, leaving
> >> them to be brough uptodate by squashfs_read_folio().  So Marek is hitting
> >> the worst case scenario where we re-read the entire block for each page
> >> in it.  I think we have to handle this tail case in ->readahead().
> >
> > I'm not sure if this is related to reading of small files. There are
> > only 50 modules being loaded from squashfs volume. I did a quick test of
> > reading the files.
> >
> > Simple file read with this patch:
> >
> > root@target:~# time find /initrd/ -type f | while read f; do cat $f
> >   >/dev/null; done
> >
> > real    0m5.865s
> > user    0m2.362s
> > sys     0m3.844s
> >
> > Without:
> >
> > root@target:~# time find /initrd/ -type f | while read f; do cat $f
> >   >/dev/null; done
> >
> > real    0m6.619s
> > user    0m2.112s
> > sys     0m4.827s
> >
>
> It has been a four day holiday in the UK (Queen's Platinum Jubilee),
> hence the delay in responding.
>
> The above read use-case is sequential (only one thread/process),
> whereas the use-case where the slow-down is observed may be
> parallel (multiple threads/processes entering Squashfs).
>
> The above sequential use-case if the small files are held in
> fragments, will be exhibiting caching behaviour that will
> ameliorate the case where the same block is being repeatedly
> re-read for each page in it.  Because each time
> Squashfs is re-entered handling only a single page, the
> decompressed block will be found in the fragment
> cache, eliminating a block decompression for each page.
>
> In a parallel use-case the decompressed fragment block
> may be being eliminated from the cache (by other reading
> processes), hence forcing the block to be repeatedly
> decompressed.
>
> Hence the slow-down will be much more noticable with a
> parallel use-case than a sequential use-case.  It also may
> be why this slipped through testing, if the test cases
> are purely sequential in nature.
>
> So Matthew's previous comment is still the most likely
> explanation for the slow-down.
>
Thanks for the pointers. To deal with short file cases (nr_pages <
max_pages), Can we refer to squashfs_fill_page() used in
squashfs_read_cache(), similar to the case where there are missing
pages on the block?

Directly calling squashfs_read_data() on short files will lead to crash:

Unable to handle kernel paging request at virtual address:
[   19.244654]  zlib_inflate+0xba4/0x10c8
[   19.244658]  zlib_uncompress+0x150/0x1bc
[   19.244662]  squashfs_decompress+0x6c/0xb4
[   19.244669]  squashfs_read_data+0x1a8/0x298
[   19.244673]  squashfs_readahead+0x2cc/0x4cc

I also noticed that the function didn't set flush_dcache_page() with
SetPageUptodate() previously.

Put these 2 issues together:

diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
index 658fb98af0cd..27519f1f9045 100644
--- a/fs/squashfs/file.c
+++ b/fs/squashfs/file.c
@@ -532,8 +532,7 @@ static void squashfs_readahead(struct
readahead_control *ractl)
                if (!nr_pages)
                        break;

-               if (readahead_pos(ractl) >= i_size_read(inode) ||
-                   nr_pages < max_pages)
+               if (readahead_pos(ractl) >= i_size_read(inode))
                        goto skip_pages;

                index = pages[0]->index >> shift;
@@ -548,6 +547,23 @@ static void squashfs_readahead(struct
readahead_control *ractl)
                if (bsize == 0)
                        goto skip_pages;

+               if (nr_pages < max_pages) {
+                       struct squashfs_cache_entry *buffer;
+
+                       buffer = squashfs_get_datablock(inode->i_sb, block,
+                                                       bsize);
+                       if (!buffer->error) {
+                               for (i = 0; i < nr_pages && expected > 0; i++,
+                                                       expected -= PAGE_SIZE) {
+                                       int avail = min_t(int,
expected, PAGE_SIZE);
+
+                                       squashfs_fill_page(pages[i],
buffer, i * PAGE_SIZE, avail);
+                               }
+                       }
+                       squashfs_cache_put(buffer);
+                       goto skip_pages;
+               }
+
                res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
                                         actor);

@@ -564,8 +580,10 @@ static void squashfs_readahead(struct
readahead_control *ractl)
                                kunmap_atomic(pageaddr);
                        }

-                       for (i = 0; i < nr_pages; i++)
+                       for (i = 0; i < nr_pages; i++) {
+                               flush_dcache_page(pages[i]);
                                SetPageUptodate(pages[i]);
+                       }
                }


> Phillip
>
> > Best regards
>