On Tue, Jun 13, 2023 at 3:28 PM Patrick O'Toole <patrick.otoole@xxxxxxxxx> wrote:
> run the query twice first, then...
Is that a realistic way to run the test? Often forcing all the data needed for the query into memory is going to make things less realistic, not more realistic. Assuming the system has more stuff to do than just perform this one query, it might be unusual for the query to find everything it needs in memory. Also, if you really do want to do it this way, then you should do this for every plan. Different plans need to access a different collections of buffers, so prewarming just one plan will privilege that one over the others.
PLAN A (default config, effective cache size just shy of 15GB): 3.829 seconds. A nested loop is used to probe the hash index `conversation_item_item_hash_index` for each row of item_text. Although the cost of probing once is low, a fair amount of time passes because the operation is repeated ~1.3 million times.PLAN B (enable_indexscan off, effective cache same as before): 3.254 seconds (~15% speedup, sometimes 30%). Both tables are scanned sequentially and conversation_item is hashed before results are combined with a hash join.PLAN C: (random_page_cost = 8.0, instead of default 4, effective cache same as before): 2.959 (~23% speedup, sometimes 38%). Same overall plan as PLAN B, some differences in buffers and I/O. I'll note we had to get to 8.0 before we saw a change to planner behavior; 5.0, 6.0, and 7.0 were too low to make a difference.
The difference between B and C looks like it is entirely noise, having to do with how many buffers it found already in the cache and how many of them needed cleaning (which causes the buffer to be dirty as the cleaned version now needs to be written to disk) and how many dirty buffers it found that needed to be written in order to make way to read other buffers it needs. (This last number most generally reflects dirty buffers left around by other things which this query encountered, not the buffers the query itself dirtied). None of this is likely to be reproducible, and so not worth investigating.
And the difference between A and BC is small enough that it is unlikely to be worth pursuing, either, even if it is reproducible. If your apps runs this one exact query often enough that a 30% difference is worth worrying about, you would probably be better served by questioning the business case. What are you doing with 1.4 million rows once you do fetch them, that it needs to be repeated so often?
If you think that taking a deep dive into this one query is going to deliver knowledge which will pay off for other (so far unexamined) queries, I suspect you are wrong. Look for queries where the misestimation is more stark than 30% to serve as your case studies.
Environment:Postgres 15.2Amazon RDS — db.m6g.2xlargeQuestions:
In Plan A, what factors are causing the planner to select a substantially slower plan despite having recent stats about number of rows?
Even if it were worth trying to answer this (which I think it is not), there isn't much we can do with dummy tables containing no data. You would need to include a script to generate data of a size and distribution which reproduces the given behavior.
> Is there a substantial difference between the on-the-fly hash done in Plan B and Plan C compared to the hash-index used in Plan A? Can I assume they are essentially the same? Perhaps there are there differences in how they're applied?
They are pretty much entirely different. Once jumps all over the index on disk, the other reads the table sequentially and (due to work_mem) parcels it out into chunks where it expects each chunk can also be read back in sequentially as well. About the only thing not different is that they both involve computing a hash function.
> Is it common to see values for random_page_cost set as high as 8.0? We would of course need to investigate whether we see a net positive or net negative impact on other queries, to adopt this as a general setting, but is it a proposal we should actually consider?
I've never needed to set it that high, but there is no a priori reason it wouldn't make sense to do. Settings that high would probably only be suitable for HDD (rather than SSD) storage and when caching is not very effective, which does seem to be the opposite of your situation. So I certainly wouldn't do it just based on the evidence at hand.
Cheers,
Jeff