Re: [RFC PATCH 0/1] Large folios in block buffered IO path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 27, 2024 at 01:02:35PM +0100, Jan Kara wrote:
> On Wed 27-11-24 07:19:59, Mateusz Guzik wrote:
> > On Wed, Nov 27, 2024 at 7:13 AM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
> > >
> > > On Wed, Nov 27, 2024 at 6:48 AM Bharata B Rao <bharata@xxxxxxx> wrote:
> > > >
> > > > Recently we discussed the scalability issues while running large
> > > > instances of FIO with buffered IO option on NVME block devices here:
> > > >
> > > > https://lore.kernel.org/linux-mm/d2841226-e27b-4d3d-a578-63587a3aa4f3@xxxxxxx/
> > > >
> > > > One of the suggestions Chris Mason gave (during private discussions) was
> > > > to enable large folios in block buffered IO path as that could
> > > > improve the scalability problems and improve the lock contention
> > > > scenarios.
> > > >
> > >
> > > I have no basis to comment on the idea.
> > >
> > > However, it is pretty apparent whatever the situation it is being
> > > heavily disfigured by lock contention in blkdev_llseek:
> > >
> > > > perf-lock contention output
> > > > ---------------------------
> > > > The lock contention data doesn't look all that conclusive but for 30% rwmixwrite
> > > > mix it looks like this:
> > > >
> > > > perf-lock contention default
> > > >  contended   total wait     max wait     avg wait         type   caller
> > > >
> > > > 1337359017     64.69 h     769.04 us    174.14 us     spinlock   rwsem_wake.isra.0+0x42
> > > >                         0xffffffff903f60a3  native_queued_spin_lock_slowpath+0x1f3
> > > >                         0xffffffff903f537c  _raw_spin_lock_irqsave+0x5c
> > > >                         0xffffffff8f39e7d2  rwsem_wake.isra.0+0x42
> > > >                         0xffffffff8f39e88f  up_write+0x4f
> > > >                         0xffffffff8f9d598e  blkdev_llseek+0x4e
> > > >                         0xffffffff8f703322  ksys_lseek+0x72
> > > >                         0xffffffff8f7033a8  __x64_sys_lseek+0x18
> > > >                         0xffffffff8f20b983  x64_sys_call+0x1fb3
> > > >    2665573     64.38 h       1.98 s      86.95 ms      rwsem:W   blkdev_llseek+0x31
> > > >                         0xffffffff903f15bc  rwsem_down_write_slowpath+0x36c
> > > >                         0xffffffff903f18fb  down_write+0x5b
> > > >                         0xffffffff8f9d5971  blkdev_llseek+0x31
> > > >                         0xffffffff8f703322  ksys_lseek+0x72
> > > >                         0xffffffff8f7033a8  __x64_sys_lseek+0x18
> > > >                         0xffffffff8f20b983  x64_sys_call+0x1fb3
> > > >                         0xffffffff903dce5e  do_syscall_64+0x7e
> > > >                         0xffffffff9040012b  entry_SYSCALL_64_after_hwframe+0x76
> > >
> > > Admittedly I'm not familiar with this code, but at a quick glance the
> > > lock can be just straight up removed here?
> > >
> > >   534 static loff_t blkdev_llseek(struct file *file, loff_t offset, int whence)
> > >   535 {
> > >   536 │       struct inode *bd_inode = bdev_file_inode(file);
> > >   537 │       loff_t retval;
> > >   538 │
> > >   539 │       inode_lock(bd_inode);
> > >   540 │       retval = fixed_size_llseek(file, offset, whence,
> > > i_size_read(bd_inode));
> > >   541 │       inode_unlock(bd_inode);
> > >   542 │       return retval;
> > >   543 }
> > >
> > > At best it stabilizes the size for the duration of the call. Sounds
> > > like it helps nothing since if the size can change, the file offset
> > > will still be altered as if there was no locking?
> > >
> > > Suppose this cannot be avoided to grab the size for whatever reason.
> > >
> > > While the above fio invocation did not work for me, I ran some crapper
> > > which I had in my shell history and according to strace:
> > > [pid 271829] lseek(7, 0, SEEK_SET)      = 0
> > > [pid 271829] lseek(7, 0, SEEK_SET)      = 0
> > > [pid 271830] lseek(7, 0, SEEK_SET)      = 0
> > >
> > > ... the lseeks just rewind to the beginning, *definitely* not needing
> > > to know the size. One would have to check but this is most likely the
> > > case in your test as well.
> > >
> > > And for that there is 0 need to grab the size, and consequently the inode lock.
> > 
> > That is to say bare minimum this needs to be benchmarked before/after
> > with the lock removed from the picture, like so:
> 
> Yeah, I've noticed this in the locking profiles as well and I agree
> bd_inode locking seems unnecessary here. Even some filesystems (e.g. ext4)
> get away without using inode lock in their llseek handler...

nod. This should be removed.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux