On Tue, Aug 16, 2022 at 10:45:48PM +0200, Stefan Wahren wrote: > Hi Jan, > > Am 16.08.22 um 11:34 schrieb Jan Kara: > > Hi Stefan! > > So this is interesting. We can see the card is 100% busy. The IO submitted > > to the card is formed by small requests - 18-38 KB per request - and each > > request takes 0.3-0.5s to complete. So the resulting throughput is horrible > > - only tens of KB/s. Also we can see there are many IOs queued for the > > device in parallel (aqu-sz columnt). This does not look like load I would > > expect to be generated by download of a large file from the web. > > > > You have mentioned in previous emails that with dd(1) you can do couple > > MB/s writing to this card which is far more than these tens of KB/s. So the > > file download must be doing something which really destroys the IO pattern > > (and with mb_optimize_scan=0 ext4 happened to be better dealing with it and > > generating better IO pattern). Can you perhaps strace the process doing the > > download (or perhaps strace -f the whole rpi-update process) so that we can > > see how does the load generated on the filesystem look like? Thanks! > > i didn't create the strace yet, but i looked at the source of rpi-update. At > the end the download phase is a curl call to download a tar archive and pipe > it directly to tar. > > You can find the content list of the tar file here: > > https://raw.githubusercontent.com/lategoodbye/mb_optimize_scan_regress/main/rpi-firmware-tar-content-list.txt > > Best regards > > > > > Honza Hi Jan and Stefan, I did some analysis of this on my Raspberry Pi 3B+. Not sure of the root cause yet but here is what I observed: 1. So I noticed that the download itself is not causing any issues in my case, but the download with a pipe to tar is what causes the degradation. With the pipe to tar, mb_optimize_scan=1 takes around 7mins whereas mb_optimize_scan=0 takes 1 min 2. I tried to replicate this performance degradation by running untar on an x86 machine but I not able to get the degradation. It is reproducible pretty consistently on my Raspberry Pi though (w/ an 8GB memory card). 3. I did analyse the resulting mb_optimize_scan=0 vs mb_optmize_scan=1 filesystem and seems like the allocated blocks are more spread out in mb_optmize_scan=1 case but not yet sure if that is the issue. Will update here if I notice anything else. Regards, Ojaswin