Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > Yeah, that is expected. Nestloop inner indexscans have a rowcount > estimate that is different from that of the parent table --- the > parent's rowcount is what would be applicable for another type of > join, such as merge or hash, where the join condition is applied > at the join node not in the relation scan. > > The problem here boils down to the fact that examine_variable > punts on appendrel variables: > * XXX This means the Var represents a column of an append > * relation. Later add code to look at the member relations and > * try to derive some kind of combined statistics? > This means you get a default estimate for the selectivity of the > join condition, so the joinrel size estimate ends up being 0.005 * > 1 * 40000. That's set long before we ever generate indexscan > plans, and I don't think there's any clean way to correct the size > estimate when we do. Thanks for the explanation. > Fixing this has been on the to-do list since forever. Which item? I looked and couldn't find one which seems to fit. (I was hoping to find a reference to a discussion thread.) > I don't think we'll make much progress on it until we have an > explicit notion of partitioned tables. I'm not clear that a generalized solution for partitioned tables would solve the production query from the OP. The OP was using table extension to model the properties of the data. In the actual production problem, the table structure involved, for example, materials -- some of which were containers (which had all the properties of other materials, plus some unique to containers); so the containers table extends the materials table to model that. In the problem query, he wanted to find all the materials related to some item. He was also joining to location, which might be (among other things) a waypoint or a container (both extending location). Note that a container is both a location and a material. > The approach contemplated in the comment, of assembling some stats > on-the-fly from the stats for individual child tables, doesn't > seem real practical from a planning-time standpoint. Can you give a thumbnail sketch of why that is? > The thing that you really want to know here is that there will be > only one matching id value in the whole partitioned table It would seem to matter nearly as much if statistics indicated you would get five rows out of twenty million. > it's difficult to extract from independent sets of stats. Since we make the attempt for most intermediate results, it's not immediately clear to me why it's so hard here. Not that I'm doubting your assertion that it *is* hard; I'm just trying to see *why* it is. Perhaps the code which generates such estimates for everything else could be made available here? The usefulness of inheritance to model data would seem to be rather limited without better optimizer support. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance