Search Postgresql Archives

Re: How do you optimize the disk IO when you cannot assume a file will start at a boundary then?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Tomas

On Thu, Feb 22, 2024 at 3:05 AM Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxxx> wrote:
On 2/22/24 02:22, Siddharth Jain wrote:
> Hi All,
>
> I understand the storage layer in databases goes to great lengths to ensure:
> - a row does not cross a block boundary
> - read/writes/allocation happen in units of blocks
> etc. The motivation is that at the OS level, it reads and writes pages
> (blocks), not individual bytes. I am only concerned about SSDs but I think
> the principle applies to HDD as well.
>
> but how can we do all this when we are not even guaranteed that the
> beginning of a file will be aligned with a block boundary? refer this
> <https://stackoverflow.com/questions/8018449/is-it-guaranteed-that-the-beginning-of-a-file-is-aligned-with-pagesize-of-file-s>
> .
>
> Further, I don't see any APIs exposing I/O operations in terms of blocks.
> All File I/O APIs I see expose a file as a randomly accessible contiguous
> byte buffer. Would it not have been easier if there were APIs that exposed
> I/O operations in terms of blocks?
>
> can someone explain this to me?
>

The short answer is that this is well outside our control. We do the
best we can - split our data files to "our" 8kB pages - and hope that
the OS / filesystem will do the right thing to map this to blocks at the
storage level.

The filesystems do the same thing, to some extent - they align stuff
with respect to the beginning of the partition, but if the partition
itself is not properly aligned, that won't really work.

As for the APIs, we work with what we have in POSIX - I don't think
there are any APIs working with blocks, and it's not clear to me how
would it fundamentally differ from the APIs we have now. Moreover, it's
not really clear which of the "block" would matter. The postgres 8kB
page? The filesytem page? The storage block/sector size?

FWIW I think for SSDs this matters way more than for HDD, because SSDs
have to erase the space before a rewrite, which makes it much more
expensive. But that's not just about the alignment, but about the page
size (with smaller pages being better).


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux