Re: [RFC PATCH 0/1] Large folios in block buffered IO path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02-Dec-24 3:38 PM, Mateusz Guzik wrote:
On Mon, Dec 2, 2024 at 10:37 AM Bharata B Rao <bharata@xxxxxxx> wrote:

On 28-Nov-24 10:01 AM, Mateusz Guzik wrote:

WIlly mentioned the folio wait queue hash table could be grown, you
can find it in mm/filemap.c:
    1062 #define PAGE_WAIT_TABLE_BITS 8
    1063 #define PAGE_WAIT_TABLE_SIZE (1 << PAGE_WAIT_TABLE_BITS)
    1064 static wait_queue_head_t folio_wait_table[PAGE_WAIT_TABLE_SIZE]
__cacheline_aligned;
    1065
    1066 static wait_queue_head_t *folio_waitqueue(struct folio *folio)
    1067 {
    1068 │       return &folio_wait_table[hash_ptr(folio, PAGE_WAIT_TABLE_BITS)];
    1069 }

Can you collect off cpu time? offcputime-bpfcc -K > /tmp/out

Flamegraph for "perf record --off-cpu -F 99 -a -g --all-kernel
--kernel-callchains -- sleep 120" is attached.

Off-cpu samples were collected for 120s at around 45th minute run of the
FIO benchmark that actually runs for 1hr. This run was with kernel that
had your inode_lock fix but no changes to PAGE_WAIT_TABLE_BITS.

Hopefully this captures the representative sample of the scalability
issue with folio lock.

Here is the data from offcputime-bpfcc -K run with inode_lock fix and no change to PAGE_WAIT_TABLE_BITS. This data was captured for the entire duration of FIO run (1hr). Since the data is huge, I am pasting a few relevant entries.

The first entry in the offcputime records

    finish_task_switch.isra.0
    schedule
    irqentry_exit_to_user_mode
    irqentry_exit
    sysvec_reschedule_ipi
    asm_sysvec_reschedule_ipi
    -                fio (33790)
        2

There are thousands of entries for read and write paths of FIO and I have shown only the first and last entries for the same here.

First entry for FIO read path that waits on folio_lock

    finish_task_switch.isra.0
    schedule
    io_schedule
    folio_wait_bit_common
    filemap_get_pages
    filemap_read
    blkdev_read_iter
    vfs_read
    ksys_read
    __x64_sys_read
    x64_sys_call
    do_syscall_64
    entry_SYSCALL_64_after_hwframe
    -                fio (34143)
        3381769535

Last entry for FIO read path that waits on folio_lock

    finish_task_switch.isra.0
    schedule
    io_schedule
    folio_wait_bit_common
    filemap_get_pages
    filemap_read
    blkdev_read_iter
    vfs_read
    ksys_read
    __x64_sys_read
    x64_sys_call
    do_syscall_64
    entry_SYSCALL_64_after_hwframe
    -                fio (34171)
        3516224519

First entry for FIO write path that waits on folio_lock

    finish_task_switch.isra.0
    schedule
    io_schedule
    folio_wait_bit_common
    __filemap_get_folio
    iomap_get_folio
    iomap_write_begin
    iomap_file_buffered_write
    blkdev_write_iter
    vfs_write
    ksys_write
    __x64_sys_write
    x64_sys_call
    do_syscall_64
    entry_SYSCALL_64_after_hwframe
    -                fio (33842)
        48900

Last entry for FIO write path that waits on folio_lock

    finish_task_switch.isra.0
    schedule
    io_schedule
    folio_wait_bit_common
    __filemap_get_folio
    iomap_get_folio
    iomap_write_begin
    iomap_file_buffered_write
    blkdev_write_iter
    vfs_write
    ksys_write
    __x64_sys_write
    x64_sys_call
    do_syscall_64
    entry_SYSCALL_64_after_hwframe
    -                fio (34187)
        1815993

The last entry in the offcputime records

    finish_task_switch.isra.0
    schedule
    futex_wait_queue
    __futex_wait
    futex_wait
    do_futex
    __x64_sys_futex
    x64_sys_call
    do_syscall_64
    entry_SYSCALL_64_after_hwframe
    -                multipathd (6308)
        3698877753

Regards,
Bharata.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux