Search Postgresql Archives

Re: Index creation takes more time?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 18/09/2012, at 20:19, Jeff Janes wrote:
> I think the one below will show an even larger discrepancy.  You are
> doing 2 casts for each comparison,
> so I think the casts overhead will dilute out the comparison.
> 
> select count(distinct foo) from  ( select cast(random() as varchar(14)) as foo
>   from generate_series (1,100000000)) asdf;

Actually, it doesn't. I suspect that it doesn't actually do string comparison per se. I don't know how "distinct" is implemented in PostgreSQL, but if it was me, I'd implement it with a hash table, which means that you calculate the hash of the string rather than compare it. Even if it is done with actual comparison, I don't think it's a collation-based comparison, but rather a byte-by-byte comparison.

> 
> 
>> Finally, I created a test table, as you asked:
>> 
>> 
>>> create table foo as select msisdn,sme_reference from
>>> sms.billing__archive limit 1000000;
>> 
>> Then I created an index on the msisdn and sme_reference columns together.
>> 99% of the data in the msisdn field consist of 11-digit phone numbers.
>> Result:
>> 
>> PC: 5792.641 ms
>> Server: 23740.470 ms
>> 
>> Huge discrepancy there.
> 
> try:
> create index ON foo (msisdn COLLATE "C", sme_reference) ;
> 
> This can only be done on 9.1 server, as that feature is new to that
> release.  It should be much faster to create than the index with
> default collation.
> 
> (or change the collation of msisdn column definition, rather than just
> in the index).
> 
> This assumes you just need the index for equality, not for some
> precise locale-specific ordering (which for phone numbers seems like a
> safe bet).


Yes, this certainly reduced the index creation time to within a reasonable margin. OK, now we have to decide whether to move the entire database to the 'C' collation (which would require, I suppose, a dump and restore) with the option of changing collation for specific columns that actually need it, or to just solve the current problem by changing the index creation commands where relevant.

Thank you very much for your help with this issue, your input has been invaluable.

Herouth

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux