Re: set-returning calls and overhead

Jon Nelson <jnelson+pgsql@xxxxxxxxxxx> · Tue, 30 Oct 2012 16:31:20 -0500

On Thu, Jul 19, 2012 at 11:07 AM, Jon Nelson <jnelson+pgsql@xxxxxxxxxxx> wrote:
> Recently I found myself wondering what was taking a particular query so long.
> I immediately assumed it was a lack of I/O, because lack of I/O is a
> thorn in my side.
> Nope, the I/O was boring. CPU? Well, the process was using 100% of the
> CPU but the query itself was really very simple.
> I turned to ltrace (horribly imprecise, I know). ltrace told me this:
>
>
> % time     seconds  usecs/call     calls      function
> ------ ----------- ----------- --------- --------------------
>  46.54    6.789433          69     97766 memcpy
>  28.16    4.108324        1100      3732 strlen
>  14.45    2.107567         564      3732 malloc
>   9.16    1.336108          28     46877 memset
>   0.74    0.107935          28      3732 strcpy
>   0.73    0.107221          28      3732 free
>   0.16    0.023687         187       126 write
>   0.02    0.003587          28       126 __errno_location
>   0.02    0.003075          59        52 read
>   0.01    0.001523          29        52 memcmp
> ------ ----------- ----------- --------- --------------------
> 100.00   14.588460                159927 total
>
>
> and this:
>
> strlen("SRF multi-call context")
> strcpy(0xe01d40, "SRF multi-call context")
> malloc(1024)
> memcpy(...)
> memset(...)
> ...
> memset(...)
> free(..)
>
> repeat.
>
> I was rather surprised to learn that (per-row):
> (1) memcpy of 64 bytes accounted for 46% of the time spent in library calls
> (2) the (other) costs of strlen, strcpy, malloc, and memset were so
> huge (in particular, strlen)
>
> What, if anything, can be done about this? It seems the overhead for
> setting up the memory context for the SRF is pretty high.
> I notice this overhead pretty much every time I use any of the array
> functions like unnest.
>
> Please help me to understand if I'm misinterpreting things here.
>
> [x86_64, Linux, PostgreSQL 9.1.4]

A followup.

Recently, I imported a bunch of data. The import ran in about 30
seconds. The data itself was represented in a way that made more sense
- from a relational database perspective - as multiple tables. To
accomplish this, I made use of string_to_array and unnest.  The
initial table creation and copy run in about 30 seconds, but then the
creation of the new table (create table ... as select ..
unnest(string_to_array(....))) took over 5 minutes. 10 times as long.
What is it about the array functions (actually, all set-returning
functions that I've tried) that causes them to be so expensive? The
per-call overhead is enormous in some cases.  PostgreSQL 9.1.5 on
x86_64 (openSUSE 12.2 - but the effect has been observed across
several platforms and major/minor releases of PostgreSQL).

> --
> Jon

-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance