On 5/30/23 10:05 PM, David Rowley wrote:
My understanding had been that concurrency was required, but I see the commit message for 00d1e02be mentions:Even single threaded COPY is measurably faster, primarily due to not dirtying pages while extending, if supported by the operating system (see commit 4d330a61bb1).If that's the case then maybe the beta release notes could be edited slightly to reflect this. Maybe something like: "Relation extensions have been improved allowing faster bulk loading of data using COPY. These improvements are more significant when multiple processes are concurrently loading data into the same table." The current text of "PostgreSQL 16 can also improve the performance of concurrent bulk loading of data using COPY up to 300%." does lead me to believe that nothing has been done to improve things when only a single backend is involved.
Typically once a release announcement is out, we'll only edit it if it's inaccurate. I don't think the statement in the release announcement is inaccurate, as it specifies that concurrent bulk loading is faster.
I had based the description on what Andres described in the original discussion and through reading[1], which showed a "measurable" improvement as the commit message said, but it was not to the same degree as concurrently loading. It does still seem impactful -- the results show up to 20% improvement on a single backend -- but the bigger story was around the concurrency.
I'm -0.5 for revising the announcement, but I also don't want people to miss out on testing this. I'd be OK with this:
"PostgreSQL 16 can also improve the performance of bulk loading of data, with some tests showing using up to 300% improvement when concurrently executing `COPY` commands."
Thanks, Jonathan[1] https://www.postgresql.org/message-id/20221029025420.eplyow6k7tgu6he3@xxxxxxxxxxxxxxxxxx
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature