On 05/02/2024 11:02, Kundan Kumar wrote: [...] > > > Thanks Ryan for help and good elaborate reply. > > I tried various combinations. Good news is mmap and aligned memory allocates > large folio and solves the issue. > Lets see the various cases one by one : > > ============== > Aligned malloc > ============== > Only the align didnt solve the issue. The command I used : > fio -iodepth=1 -iomem_align=16K -rw=write -ioengine=io_uring -direct=1 -hipri > -bs=16K -numjobs=1 -size=16k -group_reporting -filename=/dev/nvme0n1 > -name=io_uring_test > The block IO path has separate pages and separate folios. > Logs > Feb 5 15:27:32 kernel: [261992.075752] 1603 iov_iter_extract_user_pages addr = > 55b2a0542000 This is not 16K aligned, so I'm guessing that -iomem_align is being ignored for the malloc backend. Probably malloc has done a mmap() for the 16K without any padding applied and the kernel has chosen a VA that is not 16K aligned so its been populated with small folios. > Feb 5 15:27:32 kernel: [261992.075762] 1610 iov_iter_extract_user_pages > nr_pages = 4 > Feb 5 15:27:32 kernel: [261992.075786] 1291 __bio_iov_iter_get_pages page = > ffffea000d9461c0 folio = ffffea000d9461c0 > Feb 5 15:27:32 kernel: [261992.075812] 1291 __bio_iov_iter_get_pages page = > ffffea000d7ef7c0 folio = ffffea000d7ef7c0 > Feb 5 15:27:32 kernel: [261992.075836] 1291 __bio_iov_iter_get_pages page = > ffffea000d7d30c0 folio = ffffea000d7d30c0 > Feb 5 15:27:32 kernel: [261992.075861] 1291 __bio_iov_iter_get_pages page = > ffffea000d7f2680 folio = ffffea000d7f2680 > > > ============== > Non aligned mmap > ============== > mmap not aligned does somewhat better, we see 3 pages from same folio > fio -iodepth=1 -iomem=mmap -rw=write -ioengine=io_uring -direct=1 -hipri > -bs=16K -numjobs=1 -size=16k -group_reporting -filename=/dev/nvme0n1 > -name=io_uring_test > Feb 5 15:31:08 kernel: [262208.082789] 1603 iov_iter_extract_user_pages addr = > 7f72bc711000 > Feb 5 15:31:08 kernel: [262208.082808] 1610 iov_iter_extract_user_pages > nr_pages = 4 > Feb 5 15:24:31 kernel: [261811.086973] 1291 __bio_iov_iter_get_pages page = > ffffea000aed36c0 folio = ffffea000aed36c0 > Feb 5 15:24:31 kernel: [261811.087010] 1291 __bio_iov_iter_get_pages page = > ffffea000d2d0200 folio = ffffea000d2d0200 > Feb 5 15:24:31 kernel: [261811.087044] 1291 __bio_iov_iter_get_pages page = > ffffea000d2d0240 folio = ffffea000d2d0200 > Feb 5 15:24:31 kernel: [261811.087078] 1291 __bio_iov_iter_get_pages page = > ffffea000d2d0280 folio = ffffea000d2d0200 This looks strange to me. You should only get a 16K folio if the VMA has a big enough 16K-aligned section. If you are only mmapping 16K, and its address (7f72bc711000) is correct; then that's unaligned and you should only see small folios. I could believe the pages are "accidentally contiguous", but then their folios should all be different. So perhaps the program is mmapping more, and using the first part internally? Just a guess. > > > ============== > Aligned mmap > ============== > mmap and aligned "-iomem_align=16K -iomem=mmap" solves the issue !!! > Even with all the mTHP sizes enabled I see that 1 folio is present > corresponding to the 4 pages. > > fio -iodepth=1 -iomem_align=16K -iomem=mmap -rw=write -ioengine=io_uring > -direct=1 -hipri -bs=16K -numjobs=1 -size=16k -group_reporting > -filename=/dev/nvme0n1 -name=io_uring_test > Feb 5 15:29:36 kernel: [262115.791589] 1603 iov_iter_extract_user_pages addr = > 7f5c9087b000 > Feb 5 15:29:36 kernel: [262115.791611] 1610 iov_iter_extract_user_pages > nr_pages = 4 > Feb 5 15:29:36 kernel: [262115.791635] 1291 __bio_iov_iter_get_pages page = > ffffea000e0116c0 folio = ffffea000e011600 > Feb 5 15:29:36 kernel: [262115.791696] 1291 __bio_iov_iter_get_pages page = > ffffea000e011700 folio = ffffea000e011600 > Feb 5 15:29:36 kernel: [262115.791755] 1291 __bio_iov_iter_get_pages page = > ffffea000e011740 folio = ffffea000e011600 > Feb 5 15:29:36 kernel: [262115.791814] 1291 __bio_iov_iter_get_pages page = > ffffea000e011780 folio = ffffea000e011600 OK good, but addr (7f5c9087b000) is still not 16K aligned! Could this be a bug in your logging? > > So it looks like normal malloc even if aligned doesn't allocate large order > folios. Only if we do a mmap which sets the flag "OS_MAP_ANON | MAP_PRIVATE" > then we get the same folio. > > I was under assumption that malloc will internally use mmap with MAP_ANON > and we shall get same folio. Yes it will, but it also depends on the alignment being correct. > > > For just the malloc case : > > On another front I have logs in alloc_anon_folio. For just the malloc case I > see allocation of 64 pages. "addr = 5654feac0000" is the address malloced by > fio(without align and without mmap) > > Feb 5 15:56:56 kernel: [263756.413095] alloc_anon_folio comm=fio order = 6 > folio = ffffea000e044000 addr = 5654feac0000 vma = ffff88814cfc7c20 > Feb 5 15:56:56 kernel: [263756.413110] alloc_anon_folio comm=fio folio_nr_pages > = 64 > > 64 pages with be 0x40000, when added to 5654feac0000 we get 5654feb00000. > So this range user space address shall be covered in this folio itself. > > And after this when IO is issued I see the user space address passed in this > range to block IO path. But the code of iov_iter_extract_user_pages() doesnt > fetch the same pages/folio. > Feb 5 15:56:57 kernel: [263756.678586] 1603 iov_iter_extract_user_pages addr = > 5654fead4000 > Feb 5 15:56:57 kernel: [263756.678606] 1610 iov_iter_extract_user_pages > nr_pages = 4 > Feb 5 15:56:57 kernel: [263756.678629] 1291 __bio_iov_iter_get_pages page = > ffffea000dfc2b80 folio = ffffea000dfc2b80 > Feb 5 15:56:57 kernel: [263756.678684] 1291 __bio_iov_iter_get_pages page = > ffffea000dfc2bc0 folio = ffffea000dfc2bc0 > Feb 5 15:56:57 kernel: [263756.678738] 1291 __bio_iov_iter_get_pages page = > ffffea000d7b9100 folio = ffffea000d7b9100 > Feb 5 15:56:57 kernel: [263756.678790] 1291 __bio_iov_iter_get_pages page = > ffffea000d7b9140 folio = ffffea000d7b9140 > > Please let me know your thoughts on same. > > -- > Kundan Kumar