https://bugzilla.kernel.org/show_bug.cgi?id=217965 --- Comment #36 from Ojaswin Mujoo (ojaswin.mujoo@xxxxxxx) --- Hey Eyal, So the trace data has given me an idea of what's going on. Basically in ext4 we maintain a list of FS blocks groups (BGs) where each list will have BGs based on the order of free blocks (BG with 64 free blocks goes in list order 6. 640 free blocks goes in order 9 list etc). In our case, we are trying to allocate stripe size blocks at a time ie 640 blocks or roughly 2.5 KB and ext4 tries to look at the order 9 list to find a BG that is fit to satisfy our request. Unfortunately there seems to be a lot of BGs in the order 9 list (> 1000) but most of them dont have enough free blocks to satisfy the request so we keep looping and trying to call ext4_mb_good_group() on each of them to see if anyone is good enough. Once we do find a good enough BG, due to striping we actually try to look for blocks which are specially aligned to stripe size and once we don't find it we just start looping in the list again from the beginning (!!). Although I have a good idea now, I'm not able to point my finger at the exact change in 6.5 that might have caused this. We did change the allocator to some extent and it might be related to this but we need to dig a bit more deeper to confirm. Would it be possible to share the same perf record again but this time I'm adding a few more probes and removing -g so we can fit more in 5MBlimit and also the commands for Linux 6.4 so we can compare whats changed: Linux 6.5+: Probe adding commands: sudo perf probe -a "ext4_mb_find_good_group_avg_frag_lists order" sudo perf probe -a "ext4_mb_find_good_group_avg_frag_lists:18 cr iter->bb_group" sudo perf probe -a "ext4_mb_good_group:20 free fragments ac->ac_g_ex.fe_len ac->ac_2order" sudo perf probe -a "ext4_mb_scan_aligned:26 i max" Record command: perf record -e probe:ext4_mb_find_good_group_avg_frag_lists_L18 -e probe:ext4_mb_good_group_L20 -e probe:ext4_mb_find_good_group_avg_frag_lists -e probe:ext4_mb_ scan_aligned_L26 -e ext4:ext4_mballoc_alloc -p <pid> sleep 20 Linux 6.4.x: Probe adding commands: sudo perf probe -a "ext4_mb_choose_next_group_cr1:25 i iter->bb_group" sudo perf probe -a "ext4_mb_good_group:20 free fragments ac->ac_g_ex.fe_len ac->ac_2order" sudo perf probe -a "ext4_mb_scan_aligned:26 i max" Record command: sudo perf record -e probe:ext4_mb_choose_next_group_cr1_L25 -e probe:ext4_mb_good_group_L20 -e probe:ext4_mb_scan_aligned_L26 -e ext4:ext4_mballoc_alloc -p <pid> sleep 20 Thanks again for all your help on this! Regards, ojaswin -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.