Re: Indexes for hashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote:
> I have an application which stores a large amounts of hex-encoded hash
> strings (nearly 100 GB of them), which means:

Why do you keep them hex encoded, and not use bytea?

I made a sample table with 1 million rows, looking like this:

     Table "public.new"
 Column  | Type  | Modifiers 
---------+-------+-----------
 texthex | text  | 
 a_bytea | bytea | 

values are like:

$ select * from new limit 10;
                             texthex                              |                              a_bytea                               
------------------------------------------------------------------+--------------------------------------------------------------------
 c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f | \xc968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f
 61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db | \x61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db
 757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 | \x757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033
 fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 | \xfba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15
 ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a | \xecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a
 11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea | \x11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea
 5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 | \x5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852
 2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c | \x2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c
 2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 | \x2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7
 2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa | \x2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa
(10 rows)

created two indexes:
create index i1 on new (texthex);
create index i2 on new (a_bytea);

i1 is 91MB, and i2 is 56MB.

Index creation was also much faster - best out of 3 runs for i1 was 4928.982
ms, best out of 3 runs for i2 was 2047.648 ms

Best regards,

depesz



-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux