Re: squashfs performance regression and readahea

Xiongwei Song <sxwjean@xxxxxxxxx> · Fri, 13 May 2022 20:29:31 +0800

Hi Hsin-Yi,

One more thing, we should revert 9eec1d897139 ("squashfs: provide
backing_dev_info in order to disable read-ahead")? Then put the two
patches in one review?

Regards,
Xiongwei

On Fri, May 13, 2022 at 8:16 PM Xiongwei Song <sxwjean@xxxxxxxxx> wrote:
>
> Hello,
>
> On Thu, May 12, 2022 at 2:23 PM Hsin-Yi Wang <hsinyi@xxxxxxxxxxxx> wrote:
> >
> > On Thu, May 12, 2022 at 3:13 AM Phillip Lougher <phillip@xxxxxxxxxxxxxxx> wrote:
> > >
> > > On 11/05/2022 16:12, Hsin-Yi Wang wrote:
> > > > On Tue, May 10, 2022 at 9:19 PM Xiongwei Song <sxwjean@xxxxxxxxx> wrote:
> > > >>
> > > >> On Tue, May 10, 2022 at 8:47 PM Hsin-Yi Wang <hsinyi@xxxxxxxxxxxx> wrote:
> > > >>>
> > > >>> On Tue, May 10, 2022 at 8:31 PM Xiongwei Song <sxwjean@xxxxxxxxx> wrote:
> > > >>>>
> > > >>>> Hi Hsin-Yi,
> > > >>>>
> > > >>>> On Mon, May 9, 2022 at 10:29 PM Hsin-Yi Wang <hsinyi@xxxxxxxxxxxx> wrote:
> > > >>>>>
> > > >>>>> On Mon, May 9, 2022 at 9:21 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > > >>>>>>
> > > >>>>>> On Mon, May 09, 2022 at 08:43:45PM +0800, Xiongwei Song wrote:
> > > >>>>>>> Hi Hsin-Yi and Matthew,
> > > >>>>>>>
> > > >>>>>>> With the patch from the attachment on linux 5.10, ran the command as I
> > > >>>>>>> mentioned earlier,
> > > >>>>>>> got the results below:
> > > >>>>>>> 1:40.65 (1m + 40.65s)
> > > >>>>>>> 1:10.12
> > > >>>>>>> 1:11.10
> > > >>>>>>> 1:11.47
> > > >>>>>>> 1:11.59
> > > >>>>>>> 1:11.94
> > > >>>>>>> 1:11.86
> > > >>>>>>> 1:12.04
> > > >>>>>>> 1:12.21
> > > >>>>>>> 1:12.06
> > > >>>>>>>
> > > >>>>>>> The performance has improved obviously, but compared to linux 4.18, the
> > > >>>>>>> performance is not so good.
> > > >>>>>>>
> > > >>>>> I think you shouldn't compare the performance with 4.18 directly,
> > > >>>>> since there might be other factors that impact the performance.
> > > >>>>
> > > >>>> Make sense.
> > > >>>>
> > > >>>>> I'd suggest comparing the same kernel version with:
> > > >>>>> a) with this patch
> > > >>>>> b) with c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted.
> > > >>>>
> > > >>>> With 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
> > > >>>> read-ahead") reverted and applied 0001-WIP-squashfs-implement-readahead.patch,
> > > >>>> test result on linux 5.18：
> > > >>>> 1:41.51 (1m + 41.51s)
> > > >>>> 1:08.11
> > > >>>> 1:10.37
> > > >>>> 1:11.17
> > > >>>> 1:11.32
> > > >>>> 1:11.59
> > > >>>> 1:12.23
> > > >>>> 1:12.08
> > > >>>> 1:12.76
> > > >>>> 1:12.51
> > > >>>>
> > > >>>> performance worse 1 ~ 2s than linux 5.18 vanilla.
> > > >>>>
> > > >>>
> > > >>> Can you share the pack file you used for testing? Thanks
> > > >>
> > > >> You are saying the files that are put in squashfs partitions? If yes, I can tell
> > > >> I just put some dynamic libraries in partitions:
> > > >> -rwxr-xr-x 1 root root  200680 Apr 20 03:57 ld-2.33.so
> > > >> lrwxrwxrwx 1 root root      10 Apr 20 03:57 ld-linux-x86-64.so.2 -> ld-2.33.so
> > > >> -rwxr-xr-x 1 root root   18776 Apr 20 03:57 libanl-2.33.so
> > > >> lrwxrwxrwx 1 root root      14 Apr 20 03:57 libanl.so.1 -> libanl-2.33.so
> > > >> lrwxrwxrwx 1 root root      17 Apr 20 04:08 libblkid.so.1 -> libblkid.so.1.1.0
> > > >> -rwxr-xr-x 1 root root  330776 Apr 20 04:08 libblkid.so.1.1.0
> > > >> -rwxr-xr-x 1 root root 1823192 Apr 20 03:57 libc-2.33.so
> > > >> ...... snip ......
> > > >>
> > > >> The number of files is 110(55 libs + 55 soft links to libs).  I have 90 squashfs
> > > >> partitions which save the identical files. The space of each partition is 11M,
> > > >> nothing special.
> > > >>
> > > >> Thanks.
> > > >>
> > > >
> > > > I noticed that there's a crash at
> > > > https://elixir.bootlin.com/linux/latest/source/lib/lzo/lzo1x_decompress_safe.c#L218
> > > > when testing on my system.
> > > > (I have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS enabled)
> > > >
> > > > Full logs:
> > > > [  119.062420] Unable to handle kernel paging request at virtual
> > > > address ffffffc017337000
> > > > [  119.062437] Mem abort info:
> > > > [  119.062442]   ESR = 0x96000047
> > > > [  119.062447]   EC = 0x25: DABT (current EL), IL = 32 bits
> > > > [  119.062451]   SET = 0, FnV = 0
> > > > [  119.062454]   EA = 0, S1PTW = 0
> > > > [  119.062457] Data abort info:
> > > > [  119.062460]   ISV = 0, ISS = 0x00000047
> > > > [  119.062464]   CM = 0, WnR = 1
> > > > [  119.062469] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000041099000
> > > > [  119.062473] [ffffffc017337000] pgd=000000010014a003,
> > > > p4d=000000010014a003, pud=000000010014a003, pmd=000000010ba59003,
> > > > pte=0000000000000000
> > > > [  119.062489] Internal error: Oops: 96000047 [#1] PREEMPT SMP
> > > > [  119.062494] Modules linked in: vhost_vsock vhost vhost_iotlb
> > > > vmw_vsock_virtio_transport_common vsock rfcomm algif_hash
> > > > algif_skcipher af_alg veth uinput xt_cgroup mtk_dip mtk_cam_isp
> > > > mtk_vcodec_enc mtk_vcodec_dec hci_uart mtk_fd mtk_mdp3 v4l2_h264
> > > > mtk_vcodec_common mtk_vpu xt_MASQUERADE mtk_jpeg cros_ec_rpmsg btqca
> > > > videobuf2_dma_contig v4l2_fwnode v4l2_mem2mem btrtl elants_i2c mtk_scp
> > > > mtk_rpmsg rpmsg_core mtk_scp_ipi mt8183_cci_devfreq ip6table_nat fuse
> > > > 8021q bluetooth ecdh_generic ecc iio_trig_sysfs cros_ec_lid_angle
> > > > cros_ec_sensors cros_ec_sensors_core industrialio_triggered_buffer
> > > > kfifo_buf cros_ec_sensorhub cros_ec_typec typec hid_google_hammer
> > > > ath10k_sdio lzo_rle lzo_compress ath10k_core ath mac80211 zram
> > > > cfg80211 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
> > > > videobuf2_common cdc_ether usbnet r8152 mii joydev
> > > > [  119.062613] CPU: 4 PID: 4161 Comm: chrome Tainted: G        W
> > > >    5.10.112 #105 39f11bffda227eaae4c704733b9bf01db22d8b4d
> > > > [  119.062617] Hardware name: Google burnet board (DT)
> > > > [  119.062623] pstate: 20400005 (nzCv daif +PAN -UAO -TCO BTYPE=--)
> > > > [  119.062636] pc : lzo1x_decompress_safe+0x1dc/0x564
> > > > [  119.062643] lr : lzo_uncompress+0x134/0x1f0
> > > > [  119.062647] sp : ffffffc01837b860
> > > > [  119.062650] x29: ffffffc01837b860 x28: 0000000000000000
> > > > [  119.062656] x27: 0000000000005451 x26: ffffffc0171c9445
> > > > [  119.062662] x25: 0000000000000000 x24: ffffffc017437000
> > > > [  119.062668] x23: ffffffc0171c944f x22: ffffffc017136000
> > > > [  119.062673] x21: ffffffc017336ff1 x20: ffffffc017237000
> > > > [  119.062679] x19: ffffffc01837b8d0 x18: 0000000000000000
> > > > [  119.062684] x17: 00000000000001eb x16: 0000000000000012
> > > > [  119.062689] x15: 000000000010000f x14: d600120202000001
> > > > [  119.062695] x13: ffffffc017336ff1 x12: ffffffc017336ff4
> > > > [  119.062700] x11: 0000000000000002 x10: 01010101010100ff
> > > > [  119.062705] x9 : ffffffffffffffff x8 : ffffffc0171c944d
> > > > [  119.062710] x7 : d15d3aaabd294330 x6 : 0206397115fe28ab
> > > > [  119.062715] x5 : ffffffc0171c944f x4 : 000000000009344f
> > > > [  119.062721] x3 : ffffffc01837b8d0 x2 : ffffffc017237000
> > > > [  119.062726] x1 : 000000000009344f x0 : ffffffc0171c9447
> > > > [  119.062731] Call trace:
> > > > [  119.062738]  lzo1x_decompress_safe+0x1dc/0x564
> > > > [  119.062742]  lzo_uncompress+0x134/0x1f0
> > > > [  119.062746]  squashfs_decompress+0x6c/0xb4
> > > > [  119.062753]  squashfs_read_data+0x1a8/0x298
> > > > [  119.062758]  squashfs_readahead+0x308/0x474
> > > > [  119.062765]  read_pages+0x74/0x280
> > > > [  119.062769]  page_cache_ra_unbounded+0x1d0/0x228
> > > > [  119.062773]  do_page_cache_ra+0x44/0x50
> > > > [  119.062779]  do_sync_mmap_readahead+0x188/0x1a0
> > > > [  119.062783]  filemap_fault+0x100/0x350
> > > > [  119.062789]  __do_fault+0x48/0x10c
> > > > [  119.062793]  do_cow_fault+0x58/0x12c
> > > > [  119.062797]  handle_mm_fault+0x544/0x904
> > > > [  119.062804]  do_page_fault+0x260/0x384
> > > > [  119.062809]  do_translation_fault+0x44/0x5c
> > > > [  119.062813]  do_mem_abort+0x48/0xb4
> > > > [  119.062819]  el0_da+0x28/0x34
> > > > [  119.062824]  el0_sync_compat_handler+0xb8/0xcc
> > > > [  119.062829]  el0_sync_compat+0x188/0x1c0
> > > > [  119.062837] Code: f94001ae f90002ae f94005ae 910041ad (f90006ae)
> > > > [  119.062842] ---[ end trace 3e9828c7360fd7be ]---
> > > > [  119.090436] Kernel panic - not syncing: Oops: Fatal exception
> > > > [  119.090455] SMP: stopping secondary CPUs
> > > > [  119.090467] Kernel Offset: 0x2729c00000 from 0xffffffc010000000
> > > > [  119.090471] PHYS_OFFSET: 0xffffffd880000000
> > > > [  119.090477] CPU features: 0x08240002,2188200c
> > > >
> > > > 1) Traces near when the crash happened:
> > > > [   79.495580] Block @ 0x60eea9c, compressed size 65744, src size 1048576
> > > > [   80.363573] Block @ 0x1f9f000, compressed size 200598, src size 1048576
> > > > [   80.371256] Block @ 0x1fcff96, compressed size 80772, src size 1048576
> > > > [   80.428388] Block @ 0x1fe3b1a, compressed size 83941, src size 1048576
> > > > [   80.435319] Block @ 0x1ff82ff, compressed size 77936, src size 1048576
> > > > [   80.724331] Block @ 0x4501000, compressed size 364069, src size 1048576
> > > > [   80.738683] Block @ 0x4dccf28, compressed size 603215, src size 2097152
> > >
> > > Src size 2097152 is clearly wrong, as the maximum data block size is 1
> > > Mbyte or 1048576.
> > >
> > > That debug line comes from
> > >
> > > https://elixir.bootlin.com/linux/latest/source/fs/squashfs/block.c#L156
> > >
> > > ----
> > > TRACE("Block @ 0x%llx, %scompressed size %d, src size %d\n",
> > >                 index, compressed ? "" : "un", length, output->length);
> > > ----
> > >
> > > Which indicates your code has created a page_actor of 2 Mbytes in size
> > > (output->length).
> > >
> > > This is completely incorrect, as the page_actor should never be larger
> > > than the size of the block to be read in question.  In most cases that
> > > will be msblk->block_size, but it may be less at the end of the file.
> > >
> > > You appear to be trying to read the amount of readahead requested.  But,
> > > you should always be trying to read the lesser of readahead, and the
> > > size of the block in question.
> > >
> > > Hope that helps.
> > >
> > > Phillip
> > >
> > Hi Phillip,
> > Thanks for the explanation. After restricting the size feed to
> > page_actor, the crash no longer happened.
> >
> > Hi Xiongwei,
> > Can you test this version (sent as attachment) again? I've tested on
> > my platform:
> > - arm64
> > - kernel 5.10
> > - pack_data size ~ 300K
> > - time ureadahead pack_data
> > 1. with c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted:
> > 0.633s
> > 0.755s
> > 0.804s
> >
> > 2. apply the patch:
> > 0.625s
> > 0.656s
> > 0.768s
> >
> Thanks for sharing. I have done the test on 5.10 and 5.18. The results are
> a little worse than patch v1 for my test env.
>
> On linux 5.10：
> With c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted::
> 1:37.16 (1m +37.16s)
> 1:04.18
> 1:05.28
> 1:06.07
> 1:06.31
> 1:06.58
> 1:06.80
> 1:06.79
> 1:06.95
> 1:06.61
>
> With your patch v2:
> 2:04.27 (2m + 4.27s)
> 1:14.95
> 1:14.56
> 1:15.75
> 1:16.55
> 1:16.87
> 1:16.74
> 1:17.36
> 1:17.50
> 1:17.32
>
> On linux 5.18:
> The ra disabled by default::
> 1:12.82
> 1:07.68
> 1:08.94
> 1:09.65
> 1:09.87
> 1:10.32
> 1:10.47
> 1:10.34
> 1:10.24
> 1:10.34
>
> With your patch v2:
> 2:00.14 (2m + 0.14s)
> 1:13.46
> 1:14.62
> 1:15.02
> 1:15.78
> 1:16.01
> 1:16.03
> 1:16.24
> 1:16.44
> 1:16.16
>
> As you can see, there are extra 6s increase on both 5.10 and 5.18.
> Don't know if the change of page number makes the overhead.
>
> One stupid question, see below code from your patch:
>
>  +       }
>  +
>  +       kfree(actor);
>  +       return;
>  +
>  +skip_pages:
>
> when release page pointers array after pages cached? I don't see
> any chance to do that.
>
> Regards,
> Xiongwei
>
>
>
> > Hi Matthew,
> > Thanks for reviewing the patch previously. Does this version look good
> > to you? If so, I can send it to the list.
> >
> >
> > Thanks for all of your help.
> >
> > > >
> > > > It's also noticed that when the crash happened, nr_pages obtained by
> > > > readahead_count() is 512.
> > > > nr_pages = readahead_count(ractl); // this line
> > > >
> > > > 2) Normal cases that won't crash:
> > > > [   22.651750] Block @ 0xb3bbca6, compressed size 42172, src size 262144
> > > > [   22.653580] Block @ 0xb3c6162, compressed size 29815, src size 262144
> > > > [   22.656692] Block @ 0xb4a293f, compressed size 17484, src size 131072
> > > > [   22.666099] Block @ 0xb593881, compressed size 39742, src size 262144
> > > > [   22.668699] Block @ 0xb59d3bf, compressed size 37841, src size 262144
> > > > [   22.695739] Block @ 0x13698673, compressed size 65907, src size 131072
> > > > [   22.698619] Block @ 0x136a87e6, compressed size 3155, src size 131072
> > > > [   22.703400] Block @ 0xb1babe8, compressed size 99391, src size 131072
> > > > [   22.706288] Block @ 0x1514abc6, compressed size 4627, src size 131072
> > > >
> > > > nr_pages are observed to be 32, 64, 256... These won't cause a crash.
> > > > Other values (max_pages, bsize, block...) looks normal
> > > >
> > > > I'm not sure why the crash happened, but I tried to modify the mask
> > > > for a bit. After modifying the mask value to below, the crash is gone
> > > > (nr_pages are <=256).
> > > > Based on my testing on a 300K pack file, there's no performance change.
> > > >
> > > > diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > > > index 20ec48cf97c5..f6d9b6f88ed9 100644
> > > > --- a/fs/squashfs/file.c
> > > > +++ b/fs/squashfs/file.c
> > > > @@ -499,8 +499,8 @@ static void squashfs_readahead(struct
> > > > readahead_control *ractl)
> > > >   {
> > > >          struct inode *inode = ractl->mapping->host;
> > > >          struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> > > > -       size_t mask = (1UL << msblk->block_log) - 1;
> > > >          size_t shift = msblk->block_log - PAGE_SHIFT;
> > > > +       size_t mask = (1UL << shift) - 1;
> > > >
> > > >
> > > > Any pointers are appreciated. Thanks!
> > >