> is the length of the text equally distributed over the 2 partitions?
Not 100% equally, but to me it does not seem to be a big deal... Considering the ranges independently:
Not 100% equally, but to me it does not seem to be a big deal... Considering the ranges independently:
First range: ~70% < 10 KB, ~25% for 10-20 KB, ~3% for 20-30 KB, everything else is less than 1% (with 10 KB steps).
Second range: ~80% < 10 KB, ~18% for 10-20 KB, ~2% for 20-30 KB, everything else is less than 1% (with 10 KB steps).
>From what you posted, the first query retrieves 5005 rows, but the second 2416. It might be helpful if we are able to compare 5000 vs 5000
Yes it was just an example, here are the plans for approximately same number of rows:
Aggregate (cost=9210.12..9210.13 rows=1 width=16) (actual time=4265.478..4265.479 rows=1 loops=1)
Buffers: shared hit=27027 read=4311
I/O Timings: read=2738.728
-> Index Scan using articles_pkey on articles (cost=0.57..9143.40 rows=5338 width=107) (actual time=12.254..873.081 rows=5001 loops=1)
Index Cond: ((article_id >= 438030000) AND (article_id <= 438035000))
Buffers: shared hit=4282 read=710
I/O Timings: read=852.547
Planning time: 0.235 ms
Execution time: 4265.554 ms
Aggregate (cost=11794.59..11794.60 rows=1 width=16) (actual time=62298.559..62298.559 rows=1 loops=1)
Buffers: shared hit=15071 read=14847
I/O Timings: read=60703.859
-> Index Scan using articles_pkey on articles (cost=0.57..11709.13 rows=6837 width=107) (actual time=24.686..24582.221 rows=5417 loops=1)
Index Cond: ((article_id >= '100021040000'::bigint) AND (article_id <= '100021060000'::bigint))
Buffers: shared hit=195 read=5244
I/O Timings: read=24507.621
Planning time: 0.494 ms
Execution time: 62298.630 ms
If we subtract I/O from total time, we get 1527 ms vs 1596 ms — very close timings for other than I/O operations (considering slightly higher number of rows in second case). But I/O time differs dramatically.
> Also is worth noticing that the 'estimated' differs from 'actual' on the second query. I think that happens because data is differently distributed over the ranges. Probably the analyzer does not have enough samples to understand the real distribution.
I think we should not worry about it unless the planner chose poor plan, should we? Statistics affects on picking a proper plan, but not on execution of the plan, doesn't it?
> You might try to increase the number of samples (and run analyze)
To be honest, I don't understand it... As I know, in Postgres we have two options: set column target percentile and set n_distinct. We can't increase fraction of rows analyzed (like in other DBMSs we can set ANALYZE percentage explicitly). Moreover, in our case the problem column is PRIMARY KEY with all distinct values, Could you point me, what exactly should I do?
> or to create partial indexes on the 2 ranges.
Sure, will try it with partial indexes. Should I drop existing PK index, or ensuring that planner picks range index is enough?
> i would do a sync at the end, after dropping caches.
A bit off-topic, but why? Doing sync may put something to cache again.
> - does the raid controller have a cache?
> - how big is the cache? (when you measure disk speed, that will influence the result very much, if you do not run the test on big-enough data chunk) best if is disabled during your tests
> - how big is the cache? (when you measure disk speed, that will influence the result very much, if you do not run the test on big-enough data chunk) best if is disabled during your tests
I am pretty sure there is some, usually it's several tens of megabytes, but I ran disk read tests several times with chunks that could not be fit in the cache and with random offset, so I am pretty sure that something around 500 MB/s is enough reasonably accurate (but it is only for sequential read).
> - is the OS caching disk blocks too? maybe you want to drop everything from there too.
How can I find it out? And how to drop it? Or you mean hypervisor OS?
Anyway, don't you think that caching specifics could not really explain these issues?
> I think that you should be pragmatic and try to run the tests on a physical machine.
I wish I could do it, but hardly it is possible. In some future we may migrate the DB to physical hosts, but now we need to make it work in virtual.
> on the VM or on the physical host?
On the VM. The physical host is Windows (no iotop) and I have no access to it.
Vlad