Re: 8Kb or 4Kb ext4 filesystem page size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 29 May 2019 15:10:06 -0300, Alexandre hadjinlian guerra <alexhguerra@xxxxxxxxx> wrote:
Hi

Given that postgres uses 8Kb pages, im wondering why i couldnt see any
tests at all which would format ext4 partition to 8Kb pages. Im about to do
some tests, but any knowledge about such lack of tests on the internet
makes me wonder if im looking poorly or just lack of testing. besides, i do
ask if the following link remain true given XFS and EXT4 evolution since
2015
https://blog.pgaddict.com/posts/postgresql-performance-on-ext4-and-xfs

Thanks

One thing no one has yet mentioned is that I/O performance could suffer greatly if the disk page size > memory page size because a single disk page split over multiple VMM page frames may be discontiguous in memory.

That is a problem for bus-mastering disk controllers because their DMA can operate only on *physical* addresses - not the logical addresses used by the programs.  Pages touched by external DMA need to be pinned (locked in place) for the duration.

There is also DMA built-in on the system board.  Typically built-in DMA can work through the MMU with logical addresses and so (usually) does not need to pin memory pages to access them.

But it's up to the device driver which DMA (if any) is used.  Most bus-mastering devices prefer to use their own DMA hardware, and their drivers either have to pin memory pages or work through a small(ish) buffer in a fixed location (which entails extraneous copying of data).


Postgresql's 8KB logical disk pages take up two 4KB memory pages - which may not be adjacent - but since the filesystem and memory pages are the same size, DMA  (built-in or external) can access the pages in any order, and without employing (or even needing) scatter/gather ability to coalesce or distribute partial pages to/from non-contiguous locations.

Another consideration for disk page size is that program code typically is paged in directly from the executable file.  If the disk and memory pages aren't the same size, the OS page fault handler needs to be aware and able to deal with the difference. Obviously this could be addressed simply by segregating "large" pages to separate data-only volumes.

AFAIK, only the Itanium has an option for 8KB memory pages.  The "large" / "huge" memory pages available in most CPUs today are too big to be used effectively by a filesystem.
https://en.wikipedia.org/wiki/Page_(computer_memory)#Multiple_page_sizes

Rewriting filesystem drivers and the memory manager so that 8KKB or larger disk pages could be treated as a sort of "huge" memory page - overlaid on adjacent 4KB physical memory pages - would be a massive job.  Since few programs other than DBMS really would benefit from it, it isn't likely to happen.

YMMV,
George






[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux