On Mon, Jun 6, 2022 at 5:55 PM Hsin-Yi Wang <hsinyi@xxxxxxxxxxxx> wrote: > > On Mon, Jun 6, 2022 at 11:54 AM Phillip Lougher <phillip@xxxxxxxxxxxxxxx> wrote: > > > > On 03/06/2022 16:58, Marek Szyprowski wrote: > > > Hi Matthew, > > > > > > On 03.06.2022 17:29, Matthew Wilcox wrote: > > >> On Fri, Jun 03, 2022 at 10:55:01PM +0800, Hsin-Yi Wang wrote: > > >>> On Fri, Jun 3, 2022 at 10:10 PM Marek Szyprowski > > >>> <m.szyprowski@xxxxxxxxxxx> wrote: > > >>>> Hi Matthew, > > >>>> > > >>>> On 03.06.2022 14:59, Matthew Wilcox wrote: > > >>>>> On Fri, Jun 03, 2022 at 02:54:21PM +0200, Marek Szyprowski wrote: > > >>>>>> On 01.06.2022 12:39, Hsin-Yi Wang wrote: > > >>>>>>> Implement readahead callback for squashfs. It will read datablocks > > >>>>>>> which cover pages in readahead request. For a few cases it will > > >>>>>>> not mark page as uptodate, including: > > >>>>>>> - file end is 0. > > >>>>>>> - zero filled blocks. > > >>>>>>> - current batch of pages isn't in the same datablock or not enough in a > > >>>>>>> datablock. > > >>>>>>> - decompressor error. > > >>>>>>> Otherwise pages will be marked as uptodate. The unhandled pages will be > > >>>>>>> updated by readpage later. > > >>>>>>> > > >>>>>>> Suggested-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> > > >>>>>>> Signed-off-by: Hsin-Yi Wang <hsinyi@xxxxxxxxxxxx> > > >>>>>>> Reported-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> > > >>>>>>> Reported-by: Phillip Lougher <phillip@xxxxxxxxxxxxxxx> > > >>>>>>> Reported-by: Xiongwei Song <Xiongwei.Song@xxxxxxxxxxxxx> > > >>>>>>> --- > > >>>>>> This patch landed recently in linux-next as commit 95f7a26191de > > >>>>>> ("squashfs: implement readahead"). I've noticed that it causes serious > > >>>>>> issues on my test systems (various ARM 32bit and 64bit based boards). > > >>>>>> The easiest way to observe is udev timeout 'waiting for /dev to be fully > > >>>>>> populated' and prolonged booting time. I'm using squashfs for deploying > > >>>>>> kernel modules via initrd. Reverting aeefca9dfae7 & 95f7a26191deon on > > >>>>>> top of the next-20220603 fixes the issue. > > >>>>> How large are these files? Just a few kilobytes? > > >>>> Yes, they are small, most of them are smaller than 16KB, some about > > >>>> 128KB and a few about 256KB. I've sent a detailed list in private mail. > > >>>> > > >>> Hi Marek, > > >>> > > >>> Are there any obvious squashfs errors in dmesg? Did you enable > > >>> CONFIG_SQUASHFS_FILE_DIRECT or CONFIG_SQUASHFS_FILE_CACHE? > > >> I don't think it's an error problem. I think it's a short file problem. > > >> > > >> As I understand the current code (and apologies for not keeping up > > >> to date with how the patch is progressing), if the file is less than > > >> msblk->block_size bytes, we'll leave all the pages as !uptodate, leaving > > >> them to be brough uptodate by squashfs_read_folio(). So Marek is hitting > > >> the worst case scenario where we re-read the entire block for each page > > >> in it. I think we have to handle this tail case in ->readahead(). > > > > > > I'm not sure if this is related to reading of small files. There are > > > only 50 modules being loaded from squashfs volume. I did a quick test of > > > reading the files. > > > > > > Simple file read with this patch: > > > > > > root@target:~# time find /initrd/ -type f | while read f; do cat $f > > > >/dev/null; done > > > > > > real 0m5.865s > > > user 0m2.362s > > > sys 0m3.844s > > > > > > Without: > > > > > > root@target:~# time find /initrd/ -type f | while read f; do cat $f > > > >/dev/null; done > > > > > > real 0m6.619s > > > user 0m2.112s > > > sys 0m4.827s > > > > > > > It has been a four day holiday in the UK (Queen's Platinum Jubilee), > > hence the delay in responding. > > > > The above read use-case is sequential (only one thread/process), > > whereas the use-case where the slow-down is observed may be > > parallel (multiple threads/processes entering Squashfs). > > > > The above sequential use-case if the small files are held in > > fragments, will be exhibiting caching behaviour that will > > ameliorate the case where the same block is being repeatedly > > re-read for each page in it. Because each time > > Squashfs is re-entered handling only a single page, the > > decompressed block will be found in the fragment > > cache, eliminating a block decompression for each page. > > > > In a parallel use-case the decompressed fragment block > > may be being eliminated from the cache (by other reading > > processes), hence forcing the block to be repeatedly > > decompressed. > > > > Hence the slow-down will be much more noticable with a > > parallel use-case than a sequential use-case. It also may > > be why this slipped through testing, if the test cases > > are purely sequential in nature. > > > > So Matthew's previous comment is still the most likely > > explanation for the slow-down. > > > Thanks for the pointers. To deal with short file cases (nr_pages < > max_pages), Can we refer to squashfs_fill_page() used in > squashfs_read_cache(), similar to the case where there are missing > pages on the block? > > Directly calling squashfs_read_data() on short files will lead to crash: > > Unable to handle kernel paging request at virtual address: > [ 19.244654] zlib_inflate+0xba4/0x10c8 > [ 19.244658] zlib_uncompress+0x150/0x1bc > [ 19.244662] squashfs_decompress+0x6c/0xb4 > [ 19.244669] squashfs_read_data+0x1a8/0x298 > [ 19.244673] squashfs_readahead+0x2cc/0x4cc > > I also noticed that the function didn't set flush_dcache_page() with > SetPageUptodate() previously. > > Put these 2 issues together: > The patch here is not correct. Please ignore it for now. Sorry for the noice. > diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c > index 658fb98af0cd..27519f1f9045 100644 > --- a/fs/squashfs/file.c > +++ b/fs/squashfs/file.c > @@ -532,8 +532,7 @@ static void squashfs_readahead(struct > readahead_control *ractl) > if (!nr_pages) > break; > > - if (readahead_pos(ractl) >= i_size_read(inode) || > - nr_pages < max_pages) > + if (readahead_pos(ractl) >= i_size_read(inode)) > goto skip_pages; > > index = pages[0]->index >> shift; > @@ -548,6 +547,23 @@ static void squashfs_readahead(struct > readahead_control *ractl) > if (bsize == 0) > goto skip_pages; > > + if (nr_pages < max_pages) { > + struct squashfs_cache_entry *buffer; > + > + buffer = squashfs_get_datablock(inode->i_sb, block, > + bsize); > + if (!buffer->error) { > + for (i = 0; i < nr_pages && expected > 0; i++, > + expected -= PAGE_SIZE) { > + int avail = min_t(int, > expected, PAGE_SIZE); > + > + squashfs_fill_page(pages[i], > buffer, i * PAGE_SIZE, avail); > + } > + } > + squashfs_cache_put(buffer); > + goto skip_pages; > + } > + > res = squashfs_read_data(inode->i_sb, block, bsize, NULL, > actor); > > @@ -564,8 +580,10 @@ static void squashfs_readahead(struct > readahead_control *ractl) > kunmap_atomic(pageaddr); > } > > - for (i = 0; i < nr_pages; i++) > + for (i = 0; i < nr_pages; i++) { > + flush_dcache_page(pages[i]); > SetPageUptodate(pages[i]); > + } > } > > > > Phillip > > > > > Best regards > >