Greetings, * Vincenzo Romano (vincenzo.romano@xxxxxxxxxxx) wrote: > 2017-12-19 16:48 GMT+01:00 Stephen Frost <sfrost@xxxxxxxxxxx>: > > * Vincenzo Romano (vincenzo.romano@xxxxxxxxxxx) wrote: > >> Sorry, my bad: I confused V10 with v11. > >> But accordingly to a discussion with Bruce Momjan, table partitionin V10 is > >> little more than syntactic sugar around old-fashioned table partitioning. > > > > Well, it's a bit more than that since there's tuple-routing, but you're > > right that the partition elimination is the same as it was in earlier > > versions and based on constraint exclusion. That said, as noted in the > > email you replied to, reasonable numbers of partitions aren't too bad > > even with the planning cost; it's when you have many thousands of > > partitions that you get into cases where planning time for queries is > > really bad. > > When you have to handle a 100TB table, the number of partitions > shouldn't be 10 or 12 > as seen in most examples and tests. Probably not, but it would depend on the data and what you're doing with it. While there are some general rules-of-thumb, there's no one right answer when it comes to the size of individual partitions. > This is the same type of issues you hit with partial indexes (this is > why I mentioned them earlier). > Sub-table (and partial index) selection algorithm should be > logarithmic or sub-linear. Sure, and work is being done to improve PG in that exact area, but that doesn't mean it can't handle workloads like this today, but you have to be aware of the costs associated with today's partitions. > As long as it'll be linear, you'll hear about "reasonable number of partitions". > One thousand partitions for a 100TB table would make "manageable" > 100GB sub-tables. Yup, and 100G tables are certainly large and a bit awkward but they can be managed. > I could be easily wrong, but this is an are where PG needs improvements. Of course, and work is being done to improve it. What's important is knowing that there's a cost to having more partitions when querying through the parent when you get to a point where you have thousands of partitions. That cost may be perfectly fine in some use-cases and in others it might not be, and instead you'd likely have to build logic into the application layer to address it. That's not ideal, which is why there's work being done to improve PG, but it's not necessairly a big deal either. Dynamic SQL is another approach. > One could maybe think about multi-level partitioning, though. Not sure exactly what you're referring to here, but a simple multi-level partitioning setup with PG wouldn't actually change the cost for partition elimination today. > > Also as noted on this thread, PG could handle this data volume, but to > > be efficient there would be work to be done in normalization, > > aggregation, and analyzing the system to ensure you're storing and > > querying on the data efficiently. > > Normalization will grow the number of tables (and later joins) and > you'll will very likely end up with at least a table with a > "gazillion" rows. Natuarlly, but that "gazillion" rows table would be much smaller for having the data normalized- if you don't normalize it then the gazillion row table is a huge amount of duplicated data, making the entire system much larger than necessary. > I fear normalization, provided it's really needed, would provide little help. I seriously doubt that's the case. Normalization might reduce that 100TB down to 10's of TB instead, or perhaps even smaller. > With tables that big I usually do "software partitioning". > I make the software aware of the partition schema so it can direct the > queries to the needed (sub-)tables. Yes, implementing partition elimination at the application level is certainly an approach that can work with PG. Even then, ideally, you'd normalize the data so that the footprint is much smaller, making more efficient use of the storage and memory available. Thanks! Stephen
Attachment:
signature.asc
Description: Digital signature