On 2/1/15 3:08 PM, Josh Berkus wrote:
On 01/30/2015 05:34 PM, Jim Nasby wrote:
On 1/30/15 2:26 PM, Josh Berkus wrote:
This would probably work because there aren't a lot of data structures
where people would have the same key:value pair in different locations
in the JSON, and care about it stats-wise. Alternatetly, if the same
key-value pair appears multiple times in the same sample row, we could
cut the MC% by that multiple.
Even if there were multiple occurrences, this would probably still be an
improvement.
Another idea... at one time in the past when discussing statistics on
multiple columns, one idea was to build statistics on indexes. If we
built that, we could also do the same thing for at least JSONB (not sure
about JSON). Obviously doesn't help for stuff you haven't indexed, but
presumably if you care about performance and have any significant size
of data you've also indexed parts of the JSON, yes?
I'm not clear on what you're suggesting here. I'm discussing how the
stats for a JSONB field would be stored and accessed; I don't understand
what that has to do with indexing.
The JSON problem is similar to the problem of doing multi-column
statistics: there's no way to simply try to keep statistics on all
possible combinations because that's something that's can be extremely
large.
Something that's been proposed is only trying to keep multi-column stats
on column combinations that we have an index on (which in a way is
really just keeping stats on the index itself).
If we built that, we could use the same technique for JSON by simply
defining the indexes you needed for whatever you were searching on.
Obviously that's not as ideal is simply keeping full statistics on
everything in a JSON document, but it might be a heck of a lot easier to
accomplish.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance