On Tue, Nov 20, 2018 at 11:28 AM Stephen Frost <sfrost@xxxxxxxxxxx> wrote: > > Greetings, > > * Merlin Moncure (mmoncure@xxxxxxxxx) wrote: > > On Tue, Nov 20, 2018 at 10:43 AM Stephen Frost <sfrost@xxxxxxxxxxx> wrote: > > > * Merlin Moncure (mmoncure@xxxxxxxxx) wrote: > > > > On Mon, Nov 19, 2018 at 11:26 AM Stephen Frost <sfrost@xxxxxxxxxxx> wrote: > > > > > Looks like a lot of the difference being seen and the comments made > > > > > about one being faster than the other are because one system is > > > > > compressing *everything*, while PG (quite intentionally...) only > > > > > compresses the data sometimes- once it hits the TOAST limit. That > > > > > likely also contributes to why you're seeing the on-disk size > > > > > differences that you are. > > > > > > > > Hm. It may be intentional, but is it ideal? Employing datum > > > > compression in the 1kb-8kb range with a faster but less compressing > > > > algorithm could give benefits. > > > > > > Well, pglz is actually pretty fast and not as good at compression as > > > other things. I could certainly see an argument for allowing a column > > > to always be (or at least attempted to be) compressed. > > > > > > There's been a lot of discussion around supporting alternative > > > compression algorithms but making that happen is a pretty big task. > > > > Yeah; pglz is closer to zlib. There's much faster stuff out > > there...Andres summed it up pretty well; > > https://www.postgresql.org/message-id/20130605150144.GD28067%40alap2.anarazel.de > > > > There are also some interesting discussions on jsonb specific > > discussion approaches. > > Oh yes, having a dictionary would be a great start to reducing the size > of the jsonb data, though it could then become a contention point if > there's a lot of new values being inserted and such. Naturally there > would also be a cost to pulling that data back out as well but likely it > would be well worth the benefit of not having to store the field names > repeatedly. Yes, the biggest concern with a shared dictionary ought to be concurrency type problems. merlin