Greetings, * Tom Lane (tgl@xxxxxxxxxxxxx) wrote: > Laurenz Albe <laurenz.albe@xxxxxxxxxxx> writes: > > On Thu, 2021-08-26 at 18:06 +0200, hubert depesz lubaczewski wrote: > >> In total, there were 5000 queries: > >> SELECT pg_catalog.format_type('[0-9]+'::pg_catalog.oid, NULL) > >> But there were only 83 separate oids that were scanned. > > > That is a strong argument for using a hash table to cache the types. > > Those queries are coming from getFormattedTypeName(), which is used > for function arguments and the like. I'm not quite sure why Hubert > is seeing 5000 such calls in a database with only ~100 functions; > surely they don't all have an average of 50 arguments? > > I experimented with the attached, very quick-n-dirty patch to collect > format_type results during the initial scan of pg_type, instead. On the > regression database in HEAD, it reduces the number of queries pg_dump > issues from 3260 to 2905; but I'm having a hard time detecting any net > performance change. Seems like the issue here is mainly just the latency of each query being rather high compared to most use-cases, so local testing where there's basically zero latency wouldn't see any change in timing, but throw a trans-atlantic or worse amount of latency between the system running pg_dump and the PG server and you'd see notable wall-clock savings in time. Only took a quick look but generally +1 on reducing the number of queries that pg_dump is doing and the changes suggested looked good to me. Thanks, Stephen
Attachment:
signature.asc
Description: PGP signature