Re: [PATCH 0/5 v2] ext4: Fix performance regression with mballoc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jan,

Am 06.09.22 um 17:29 schrieb Jan Kara:
Hello,

Here is a second version of my mballoc improvements to avoid spreading
allocations with mb_optimize_scan=1. The patches fix the performance
regression I was able to reproduce with reaim on my test machine:

                      mb_optimize_scan=0     mb_optimize_scan=1     patched
Hmean     disk-1       2076.12 (   0.00%)     2099.37 (   1.12%)     2032.52 (  -2.10%)
Hmean     disk-41     92481.20 (   0.00%)    83787.47 *  -9.40%*    90308.37 (  -2.35%)
Hmean     disk-81    155073.39 (   0.00%)   135527.05 * -12.60%*   154285.71 (  -0.51%)
Hmean     disk-121   185109.64 (   0.00%)   166284.93 * -10.17%*   185298.62 (   0.10%)
Hmean     disk-161   229890.53 (   0.00%)   207563.39 *  -9.71%*   232883.32 *   1.30%*
Hmean     disk-201   223333.33 (   0.00%)   203235.59 *  -9.00%*   221446.93 (  -0.84%)
Hmean     disk-241   235735.25 (   0.00%)   217705.51 *  -7.65%*   239483.27 *   1.59%*
Hmean     disk-281   266772.15 (   0.00%)   241132.72 *  -9.61%*   263108.62 (  -1.37%)
Hmean     disk-321   265435.50 (   0.00%)   245412.84 *  -7.54%*   267277.27 (   0.69%)

The changes also significanly reduce spreading of allocations for small /
moderately sized files. I'm not able to measure a performance difference
resulting from this but on eMMC storage this seems to be the main culprit
of reduced performance. Untarring of raspberry-pi archive touches following
numbers of groups:

	mb_optimize_scan=0	mb_optimize_scan=1	patched
groups	4			22			7

To achieve this I have added two more changes on top of v1 - patches 4 and 5.
Patch 4 makes sure we use locality group preallocation even for files that are
not likely to grow anymore (previously we have disabled all preallocations for
such files, however locality group preallocation still makes a lot of sense for
such files). This patch reduced spread of a small file allocations but larger
file allocations were still spread significantly because they avoid locality
group preallocation and as they are not power-of-two in size, they also
immediately start with cr=1 scan. To address that I've changed the data
structure for looking up the best block group to allocate from (see patch 5
for details).

Stefan, can you please test whether these patches fix the problem for you as
well? Comments & review welcome.

this looks amazing \o/

With this patch v2 applied the untar with mb_optimize_scan=1 is now faster than mb_optimize_scan=0.

mb_optimize_scan=0 -> almost 5 minutes

mb_optimize_scan=1 -> almost 1 minute

The original scenario (firmware download) with mb_optimize_scan=1 is now fast as mb_optimize_scan=0.

Here the iostat as usual:

https://github.com/lategoodbye/mb_optimize_scan_regress/commit/f4ad188e0feee60bffa23a8e1ad254544768c3bd

There is just one thing, but not sure this if this comes from these patches. If i call

cat /proc/fs/ext4/mmcblk1p2/mb_structs_summary

The kernel throw a NULL pointer derefence in ext4_mb_seq_structs_summary_show

Best regards


								Honza
Previous versions:
Link: http://lore.kernel.org/r/20220823134508.27854-1-jack@xxxxxxx # v1



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux