Hello, Eduardo. > Why do you use a dictionary compression and not zlib/lz4/bzip/anyother? Internally PostgreSQL already has LZ77 family algorithm - PGLZ. I didn't try to replace it, only to supplement. PGLZ compresses every piece of data (JSONB documents in this case) independently. What I did is removed redundant data that exists between documents and that PGLZ can't compress since every single document usually uses every key and similar strings (some sort of string tags in arrays, etc) only once. > Compress/Decompress speed? By my observations PGLZ has characteristics similar to GZIP. I didn't benchmark ZSON encoding/decoding separately from DBMS because end user is interested only in TPS which depends on IO, amount of documents that we could fit into memory and other factors. > As I understand, postgresql must decompress before use. Only if you try to read document fields. For deleting a tuple, doing vacuum, etc there is no need to decompress a data. > Some compressing algs (dictionary transforms where a token is word) > allow search for tokens/words directly on compressed data transforming > the token/word to search in dictionary entry and searching it in > compressed data. From it, replace, substring, etc... string > manipulations algs at word level can be implemented. Unfortunately I doubt that current ZSON implementation can use these ideas. However I must agree that it's a very interesting field of research. I don't think anyone tried to do something like this in PostgreSQL yet. > My passion is compression, do you care if I try other algorithms? For > that, some dict id numbers (>1024 or >1<<16 or <128 for example) say > which compression algorithm is used or must change zson_header to store > that information. Doing that, each document could be compressed with > the best compressor (size or decompression speed) at idle times or at > request. By all means! Naturally if you'll find a better encoding I would be happy to merge corresponding code in ZSON's repository. > Thanks for sharing and time. Thanks for feedback and sharing your thoughts! -- Best regards, Aleksander Alekseev
Attachment:
signature.asc
Description: PGP signature