Search Postgresql Archives

Re: ZSON, PostgreSQL extension for compressing JSONB

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, Eduardo.

> Why do you use a dictionary compression and not zlib/lz4/bzip/anyother?

Internally PostgreSQL already has LZ77 family algorithm - PGLZ. I didn't
try to replace it, only to supplement. PGLZ compresses every piece of
data (JSONB documents in this case) independently. What I did is removed
redundant data that exists between documents and that PGLZ can't
compress since every single document usually uses every key and similar
strings (some sort of string tags in arrays, etc) only once.

> Compress/Decompress speed?

By my observations PGLZ has characteristics similar to GZIP. I didn't
benchmark ZSON encoding/decoding separately from DBMS because end
user is interested only in TPS which depends on IO, amount of documents
that we could fit into memory and other factors.

> As I understand, postgresql must decompress before use.

Only if you try to read document fields. For deleting a tuple, doing
vacuum, etc there is no need to decompress a data.

> Some compressing algs (dictionary transforms where a token is word)
> allow search for tokens/words directly on compressed data transforming
> the token/word to search in dictionary entry and searching it in
> compressed data. From it, replace, substring, etc... string
> manipulations algs at word level can be implemented.

Unfortunately I doubt that current ZSON implementation can use these
ideas. However I must agree that it's a very interesting field of
research. I don't think anyone tried to do something like this in
PostgreSQL yet.

> My passion is compression, do you care if I try other algorithms? For
> that, some dict id numbers (>1024 or >1<<16 or <128 for example) say
> which compression algorithm is used or must change zson_header to store
> that information. Doing that, each document could be compressed with
> the best compressor (size or decompression speed) at idle times or at
> request.

By all means! Naturally if you'll find a better encoding I would be happy
to merge corresponding code in ZSON's repository.

> Thanks for sharing and time.

Thanks for feedback and sharing your thoughts!

-- 
Best regards,
Aleksander Alekseev

Attachment: signature.asc
Description: PGP signature


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux