Re: Pages doesn't belong to same large order folio in block IO path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/02/2024 11:02, Kundan Kumar wrote:
[...]

> 
> 
> Thanks Ryan for help and good elaborate reply.
> 
> I tried various combinations. Good news is mmap and aligned memory allocates
> large folio and solves the issue.
> Lets see the various cases one by one :
> 
> ==============
> Aligned malloc 
> ==============
> Only the align didnt solve the issue. The command I used :
> fio -iodepth=1 -iomem_align=16K -rw=write -ioengine=io_uring -direct=1 -hipri
> -bs=16K -numjobs=1 -size=16k -group_reporting -filename=/dev/nvme0n1
> -name=io_uring_test
> The block IO path has separate pages and separate folios.
> Logs
> Feb  5 15:27:32 kernel: [261992.075752] 1603 iov_iter_extract_user_pages addr =
> 55b2a0542000

This is not 16K aligned, so I'm guessing that -iomem_align is being ignored for
the malloc backend. Probably malloc has done a mmap() for the 16K without any
padding applied and the kernel has chosen a VA that is not 16K aligned so its
been populated with small folios.

> Feb  5 15:27:32 kernel: [261992.075762] 1610 iov_iter_extract_user_pages
> nr_pages = 4
> Feb  5 15:27:32 kernel: [261992.075786] 1291 __bio_iov_iter_get_pages page =
> ffffea000d9461c0 folio = ffffea000d9461c0
> Feb  5 15:27:32 kernel: [261992.075812] 1291 __bio_iov_iter_get_pages page =
> ffffea000d7ef7c0 folio = ffffea000d7ef7c0
> Feb  5 15:27:32 kernel: [261992.075836] 1291 __bio_iov_iter_get_pages page =
> ffffea000d7d30c0 folio = ffffea000d7d30c0
> Feb  5 15:27:32 kernel: [261992.075861] 1291 __bio_iov_iter_get_pages page =
> ffffea000d7f2680 folio = ffffea000d7f2680

> 
> 
> ==============
> Non aligned mmap 
> ==============
> mmap not aligned does somewhat better, we see 3 pages from same folio
> fio -iodepth=1  -iomem=mmap -rw=write -ioengine=io_uring -direct=1 -hipri
> -bs=16K -numjobs=1 -size=16k -group_reporting -filename=/dev/nvme0n1
> -name=io_uring_test
> Feb  5 15:31:08 kernel: [262208.082789] 1603 iov_iter_extract_user_pages addr =
> 7f72bc711000
> Feb  5 15:31:08 kernel: [262208.082808] 1610 iov_iter_extract_user_pages
> nr_pages = 4
> Feb  5 15:24:31 kernel: [261811.086973] 1291 __bio_iov_iter_get_pages page =
> ffffea000aed36c0 folio = ffffea000aed36c0
> Feb  5 15:24:31 kernel: [261811.087010] 1291 __bio_iov_iter_get_pages page =
> ffffea000d2d0200 folio = ffffea000d2d0200
> Feb  5 15:24:31 kernel: [261811.087044] 1291 __bio_iov_iter_get_pages page =
> ffffea000d2d0240 folio = ffffea000d2d0200
> Feb  5 15:24:31 kernel: [261811.087078] 1291 __bio_iov_iter_get_pages page =
> ffffea000d2d0280 folio = ffffea000d2d0200

This looks strange to me. You should only get a 16K folio if the VMA has a big
enough 16K-aligned section. If you are only mmapping 16K, and its address
(7f72bc711000) is correct; then that's unaligned and you should only see small
folios. I could believe the pages are "accidentally contiguous", but then their
folios should all be different. So perhaps the program is mmapping more, and
using the first part internally? Just a guess.


> 
> 
> ==============
> Aligned mmap 
> ==============
> mmap and aligned "-iomem_align=16K -iomem=mmap" solves the issue !!!
> Even with all the mTHP sizes enabled I see that 1 folio is present
> corresponding to the 4 pages.
> 
> fio -iodepth=1 -iomem_align=16K -iomem=mmap -rw=write -ioengine=io_uring
> -direct=1 -hipri -bs=16K -numjobs=1 -size=16k -group_reporting
> -filename=/dev/nvme0n1 -name=io_uring_test
> Feb  5 15:29:36 kernel: [262115.791589] 1603 iov_iter_extract_user_pages addr =
> 7f5c9087b000
> Feb  5 15:29:36 kernel: [262115.791611] 1610 iov_iter_extract_user_pages
> nr_pages = 4
> Feb  5 15:29:36 kernel: [262115.791635] 1291 __bio_iov_iter_get_pages page =
> ffffea000e0116c0 folio = ffffea000e011600
> Feb  5 15:29:36 kernel: [262115.791696] 1291 __bio_iov_iter_get_pages page =
> ffffea000e011700 folio = ffffea000e011600
> Feb  5 15:29:36 kernel: [262115.791755] 1291 __bio_iov_iter_get_pages page =
> ffffea000e011740 folio = ffffea000e011600
> Feb  5 15:29:36 kernel: [262115.791814] 1291 __bio_iov_iter_get_pages page =
> ffffea000e011780 folio = ffffea000e011600

OK good, but addr (7f5c9087b000) is still not 16K aligned! Could this be a bug
in your logging?

> 
> So it looks like normal malloc even if aligned doesn't allocate large order
> folios. Only if we do a mmap which sets the flag "OS_MAP_ANON | MAP_PRIVATE"
> then we get the same folio.
> 
> I was under assumption that malloc will internally use mmap with MAP_ANON
> and we shall get same folio.

Yes it will, but it also depends on the alignment being correct.

> 
> 
> For just the malloc case : 
> 
> On another front I have logs in alloc_anon_folio. For just the malloc case I
> see allocation of 64 pages. "addr = 5654feac0000" is the address malloced by
> fio(without align and without mmap) 
> 
> Feb  5 15:56:56 kernel: [263756.413095] alloc_anon_folio comm=fio order = 6
> folio = ffffea000e044000 addr = 5654feac0000 vma = ffff88814cfc7c20
> Feb  5 15:56:56 kernel: [263756.413110] alloc_anon_folio comm=fio folio_nr_pages
> = 64
> 
> 64 pages with be 0x40000, when added to 5654feac0000 we get 5654feb00000. 
> So this range user space address shall be covered in this folio itself. 
> 
> And after this when IO is issued I see the user space address passed in this
> range to block IO path. But the code of iov_iter_extract_user_pages() doesnt
> fetch the same pages/folio.
> Feb  5 15:56:57 kernel: [263756.678586] 1603 iov_iter_extract_user_pages addr =
> 5654fead4000
> Feb  5 15:56:57 kernel: [263756.678606] 1610 iov_iter_extract_user_pages
> nr_pages = 4
> Feb  5 15:56:57 kernel: [263756.678629] 1291 __bio_iov_iter_get_pages page =
> ffffea000dfc2b80 folio = ffffea000dfc2b80
> Feb  5 15:56:57 kernel: [263756.678684] 1291 __bio_iov_iter_get_pages page =
> ffffea000dfc2bc0 folio = ffffea000dfc2bc0
> Feb  5 15:56:57 kernel: [263756.678738] 1291 __bio_iov_iter_get_pages page =
> ffffea000d7b9100 folio = ffffea000d7b9100
> Feb  5 15:56:57 kernel: [263756.678790] 1291 __bio_iov_iter_get_pages page =
> ffffea000d7b9140 folio = ffffea000d7b9140
> 
> Please let me know your thoughts on same.
> 
> --
> Kundan Kumar





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux