On 2/22/24 02:22, Siddharth Jain wrote: > Hi All, > > I understand the storage layer in databases goes to great lengths to ensure: > - a row does not cross a block boundary > - read/writes/allocation happen in units of blocks > etc. The motivation is that at the OS level, it reads and writes pages > (blocks), not individual bytes. I am only concerned about SSDs but I think > the principle applies to HDD as well. > > but how can we do all this when we are not even guaranteed that the > beginning of a file will be aligned with a block boundary? refer this > <https://stackoverflow.com/questions/8018449/is-it-guaranteed-that-the-beginning-of-a-file-is-aligned-with-pagesize-of-file-s> > . > > Further, I don't see any APIs exposing I/O operations in terms of blocks. > All File I/O APIs I see expose a file as a randomly accessible contiguous > byte buffer. Would it not have been easier if there were APIs that exposed > I/O operations in terms of blocks? > > can someone explain this to me? > The short answer is that this is well outside our control. We do the best we can - split our data files to "our" 8kB pages - and hope that the OS / filesystem will do the right thing to map this to blocks at the storage level. The filesystems do the same thing, to some extent - they align stuff with respect to the beginning of the partition, but if the partition itself is not properly aligned, that won't really work. As for the APIs, we work with what we have in POSIX - I don't think there are any APIs working with blocks, and it's not clear to me how would it fundamentally differ from the APIs we have now. Moreover, it's not really clear which of the "block" would matter. The postgres 8kB page? The filesytem page? The storage block/sector size? FWIW I think for SSDs this matters way more than for HDD, because SSDs have to erase the space before a rewrite, which makes it much more expensive. But that's not just about the alignment, but about the page size (with smaller pages being better). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company