Re: Trigger Performance

Jasen Betts <jasen@xxxxxxxxxx> · 17 Jan 2011 10:29:45 GMT

On 2011-01-16, Randall Smith <randall@xxxxxx> wrote:
> Before reading.  This is solved.  Was an error on my part.
>
> On Sun, 2011-01-16 at 03:46 +0000, Jasen Betts wrote:
>> In plpgsql IF is an implicit select. 
>> <http://www.postgresql.org/docs/8.4/interactive/plpgsql-expressions.html>
>> 
>>          IF EXISTS (SELECT 1 FROM t1 WHERE
>>          volume_id = NEW.volume_id AND name = NEW.name) THEN
>>              RAISE EXCEPTION '% already exists on volume', NEW.name;
>>          END IF;
>
> Thanks.  Certainly more concise.
>
>> 
>> > 0.053 ms/record / 1,000 ms/sec * 10,000 records = .53 seconds
>> 
>> huh?
>> 
>> > According to that stat, this lookup should be adding about 0.5 seconds
>> > to 10,000 records,
>> 
>> why?  what are you doing to 10000 records.
>
> Inserting them.  Sorry, that was a critical omission on my part.  The
> trigger check is for inserts and I'm testing its performance by
> inserting 10,000 records.

that'll have n.log(n) complexity (or worse) you can't simply multiply by
10000 because each row added to the index slows the average index lookup
time a little.

> Turns out my EXPLAIN ANALYZE times were so low because I was querying
> for a volume_id that had only a few rows.  When I query for a volume_id
> with 10,000 rows, it changes to 7 ms, which matches the performance I'm
> seeing.  That's acceptable to me because that's probably at the upper
> end of what I'll see.  7 ms to check 10,000 text fields is actually
> impressive to me.

>> how much disk (in bytes, and dollars) are you hoping to save by not
>> using the index.

> I need to be able to fit my indexes in RAM.  

why? Indices that don't fit are still useful.  Doing 20 or so record
retreivals to confirm the absense of a record is likely to be slower
than doing 5 reads and a write or two to check and update a disk-based
index.

postgres isn't going to blindly load all the indices into ram.

> I need to be able to fit my indexes in RAM.  This table will have a few
> billion records These text fields can
> be up to 1k each. 

this query will create about 300000 sample records with 1K text 
repeat it with dufferent numbers in the first generate_series
 10001,20000
 20001,30000 
 etc
 until you get 2 billion records

 then create your indices and do your testing.

insert into t1 select a::integer as volume_id,
  md5((a*b)::text) || md5((a*b+1)::text) || md5((a*b+3)::text)
  || md5((a*b+4)::text) || md5((a*b+5)::text) || md5((a*b+6)::text) 
  || md5((a*b+7)::text) || md5((a*b+8)::text) || md5((a*b+9)::text) 
  || md5((a*b+11)::text) || md5((a*b+12)::text) || md5((a*b+13)::text)
  || md5((a*b+14)::text) || md5((a*b+15)::text) as name from
(select generate_series( 1 ,10000) as a ,generate_series(1,29) as b) as foo;                 

-- 
ââ 100% natural

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general