Hi Ivan,
How about using crc32 ? and then index the integer as the result of crc32 function? you can split the hash into 2 part and do crc32 2x ? and then create composite index on both integer (the crc32 result)
instead of using 64 char, you only employ 2 integer as index key.
Regards,
Jul
On Wed, Jun 15, 2016 at 8:54 PM, Ivan Voras <ivoras@xxxxxxxxx> wrote:
Hi,I understand your idea, and have also been thinking about it. Basically, existing applications would need to be modified, however slightly, and that wouldn't be good.On 15 June 2016 at 15:38, hubert depesz lubaczewski <depesz@xxxxxxxxxx> wrote:On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote:
> I have an application which stores a large amounts of hex-encoded hash
> strings (nearly 100 GB of them), which means:
Why do you keep them hex encoded, and not use bytea?
I made a sample table with 1 million rows, looking like this:
Table "public.new"
Column | Type | Modifiers
---------+-------+-----------
texthex | text |
a_bytea | bytea |
values are like:
$ select * from new limit 10;
texthex | a_bytea
------------------------------------------------------------------+--------------------------------------------------------------------
c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f | \xc968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f
61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db | \x61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db
757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 | \x757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033
fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 | \xfba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15
ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a | \xecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a
11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea | \x11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea
5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 | \x5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852
2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c | \x2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c
2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 | \x2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7
2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa | \x2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa
(10 rows)
created two indexes:
create index i1 on new (texthex);
create index i2 on new (a_bytea);
i1 is 91MB, and i2 is 56MB.
Index creation was also much faster - best out of 3 runs for i1 was 4928.982
ms, best out of 3 runs for i2 was 2047.648 ms
Best regards,
depesz
Julyanto SUTANDANG
Equnix Business Solutions, PT
(An Open Source an Open Mind Company)
Pusat Niaga ITC Roxy Mas Blok C2/42. Jl. KH Hasyim Ashari 125, Jakarta Pusat
T: +6221 22866662 F: +62216315281 M: +628164858028
Caution: The information enclosed in this email (and any attachments) may be legally privileged and/or confidential and is intended only for the use of the addressee(s). No addressee should forward, print, copy, or otherwise reproduce this message in any manner that would allow it to be viewed by any individual not originally listed as a recipient. If the reader of this message is not the intended recipient, you are hereby notified that any unauthorized disclosure, dissemination, distribution, copying or the taking of any action in reliance on the information herein is strictly prohibited. If you have received this communication in error, please immediately notify the sender and delete this message.Unless it is made by the authorized person, any views expressed in this message are those of the individual sender and may not necessarily reflect the views of PT Equnix Business Solutions.