Re: Question on slow fallocate

Eric Sandeen <sandeen@xxxxxxxxxxx> · Tue, 27 Jun 2023 11:12:01 -0500

On 6/27/23 10:50 AM, Masahiko Sawada wrote:
On Tue, Jun 27, 2023 at 12:32 AM Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:

On 6/25/23 10:17 PM, Masahiko Sawada wrote:
FYI, to share the background of what PostgreSQL does, when
bulk-insertions into one table are running concurrently, one process
extends the underlying files depending on how many concurrent
processes are waiting to extend. The more processes wait, the more 8kB
blocks are appended. As the current implementation, if the process
needs to extend the table by more than 8 blocks (i.e. 64kB) it uses
posix_fallocate(), otherwise it uses pwrites() (see the code[1] for
details). We don't use fallocate() for small extensions as it's slow
on some filesystems. Therefore, if a bulk-insertion process tries to
extend the table by say 5~10 blocks many times, it could use
poxis_fallocate() and pwrite() alternatively, which led to the slow
performance as I reported.

To what end? What problem is PostgreSQL trying to solve with this
scheme? I might be missing something but it seems like you've described
the "what" in detail, but no "why."

It's for better scalability. SInce the process who wants to extend the
table needs to hold an exclusive lock on the table, we need to
minimize the work while holding the lock.

Ok, but what is the reason for zeroing out the blocks prior to them 
being written with real data? I'm wondering what the core requirement 
here is for the zeroing, either via fallocate (which btw posix_fallocate 
does not guarantee) or pwrites of zeros.

Thanks,
-Eric