Re: bad COPY performance with NOTIFY in a trigger

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the feedback.

This patch is my first and obvious approach.

@Merlin, I'm not sure if I get your idea:
- keep previous behaviour as obligatory? (which is: automatic
de-duplicating of incoming messages by channel+payload),
- instead of trivial search (sorting by browsing) use some kind of
faster lookups?

I'm not sure if this statement in async.c is carved in stone:

* Duplicate notifications from the same transaction are sent out as one
* notification only. This is done to save work when for example a trigger
* on a 2 million row table fires a notification for each row that has been
* changed. If the application needs to receive every single notification
* that has been sent, it can easily add some unique string into the extra
* payload parameter.

1) "work-saving" is disputable in some cases

2) an idea to "add some unique string" is OK logical-wise but it's not
OK performance-wise.

Current search code is a sequential search:
https://github.com/filiprem/postgres/blob/master/src/backend/commands/async.c#L2139

I'm not that smart to devise an algorithm for faster lookups -
probably you guys can give some advice.

Again, my rationale is... This feature can burn a lot of CPU for
nothing. I was hoping to use NOTIFY/LISTEN as superfast notification
mechanism. Superfast regardless on whether you insert 100, 10k or 1m
rows.




On Fri, Feb 5, 2016 at 8:52 PM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:
> On Fri, Feb 5, 2016 at 9:33 AM, Filip Rembiałkowski
> <filip.rembialkowski@xxxxxxxxx> wrote:
>> patch submitted on -hackers list.
>> http://www.postgresql.org/message-id/CAP_rwwn2z0gPOn8GuQ3qDVS5+HgEcG2EzEOyiJtcA=vpDEhoCg@xxxxxxxxxxxxxx
>>
>> results after the patch:
>>
>> trigger= BEGIN RETURN NULL; END
>> rows=40000
>>       228ms COPY test.tab FROM '/tmp/test.dat'
>>       205ms COPY test.tab FROM '/tmp/test.dat'
>> rows=80000
>>       494ms COPY test.tab FROM '/tmp/test.dat'
>>       395ms COPY test.tab FROM '/tmp/test.dat'
>> rows=120000
>>       678ms COPY test.tab FROM '/tmp/test.dat'
>>       652ms COPY test.tab FROM '/tmp/test.dat'
>> rows=160000
>>       956ms COPY test.tab FROM '/tmp/test.dat'
>>       822ms COPY test.tab FROM '/tmp/test.dat'
>> rows=200000
>>      1184ms COPY test.tab FROM '/tmp/test.dat'
>>      1072ms COPY test.tab FROM '/tmp/test.dat'
>> trigger= BEGIN PERFORM pg_notify('test',NEW.id::text); RETURN NULL; END
>> rows=40000
>>       440ms COPY test.tab FROM '/tmp/test.dat'
>>       406ms COPY test.tab FROM '/tmp/test.dat'
>> rows=80000
>>       887ms COPY test.tab FROM '/tmp/test.dat'
>>       769ms COPY test.tab FROM '/tmp/test.dat'
>> rows=120000
>>      1346ms COPY test.tab FROM '/tmp/test.dat'
>>      1171ms COPY test.tab FROM '/tmp/test.dat'
>> rows=160000
>>      1710ms COPY test.tab FROM '/tmp/test.dat'
>>      1709ms COPY test.tab FROM '/tmp/test.dat'
>> rows=200000
>>      2189ms COPY test.tab FROM '/tmp/test.dat'
>>      2206ms COPY test.tab FROM '/tmp/test.dat'
>
> I'm not so sure that this is a great idea.  Generally, we tend to
> discourage GUCs that control behavior at the SQL level.  Are you 100%
> certain that there is no path to optimizing this case without changing
> behvior?
>
> merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux