On 07/12/2011 01:04 PM, lars wrote:
On 07/12/2011 12:08 PM, Kevin Grittner wrote:
lars<lhofhansl@xxxxxxxxx> wrote:
select count(*) from test where tenant = $1 and created_date = $2
Ah, that might be a major clue -- prepared statements.
What sort of a plan do you get for that as a prepared statement?
(Note, it is very likely *not* to be the same plan as you get if you
run with literal values!) It is not at all unlikely that it could
resort to a table scan if you have one tenant which is five or ten
percent of the table, which would likely trigger the pruning as it
passed over the modified pages.
-Kevin
So a read of a row *will* trigger dead tuple pruning, and that
requires WAL logging, and this is known/expected?
This is actually the only answer I am looking for. :) I have not seen
this documented anywhere.
I know that Postgres will generate general plans for prepared
statements (how could it do otherwise?),
I also know that it sometimes chooses a sequential scan.
This can always be tweaked to touch fewer rows and/or use a different
plan. That's not my objective, though!
The fact that a select (maybe a big analytical query we'll run)
touching many rows will update the WAL and wait
(apparently) for that IO to complete is making a fully cached database
far less useful.
I just artificially created this scenario.
... Just dropped the table to test something so I can't get the plan
right now. Will send an update as soon as I get
it setup again.
Thanks again.
-- Lars
Ok... Slightly changes the indexes:
\d test
Table "lars.test"
Column | Type | Modifiers
--------------+---------------+-----------
tenant | character(15) |
created_by | character(15) |
created_date | date |
Indexes:
"i1" btree (tenant)
So just just a simple index on tenant.
prepare x as select count(*) from test where tenant = $1 and
created_date = $2;
PREPARE
explain execute x('000000000000001','2011-6-30');
QUERY PLAN
-------------------------------------------------------------------------------
Aggregate (cost=263301.40..263301.41 rows=1 width=0)
-> Bitmap Heap Scan on test (cost=3895.99..263299.28 rows=847 width=0)
Recheck Cond: (tenant = $1)
Filter: (created_date = $2)
-> Bitmap Index Scan on i1 (cost=0.00..3895.77 rows=169372
width=0)
Index Cond: (tenant = $1)
(6 rows)
-- this is when the WAL rows are written:
explain (analyze on, buffers on) execute x('000000000000001','2011-6-30');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=263301.40..263301.41 rows=1 width=0) (actual
time=191.150..191.151 rows=1 loops=1)
Buffers: shared hit=3716
-> Bitmap Heap Scan on test (cost=3895.99..263299.28 rows=847
width=0) (actual time=1.966..188.221 rows=3712 loops=1)
Recheck Cond: (tenant = $1)
Filter: (created_date = $2)
Buffers: shared hit=3716
-> Bitmap Index Scan on i1 (cost=0.00..3895.77 rows=169372
width=0) (actual time=1.265..1.265 rows=3712 loops=1)
Index Cond: (tenant = $1)
Buffers: shared hit=20
Total runtime: 191.243 ms
(10 rows)
-- this is when no WAL is written:
explain (analyze on, buffers on) execute x('000000000000001','2011-6-30');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=263301.40..263301.41 rows=1 width=0) (actual
time=11.529..11.530 rows=1 loops=1)
Buffers: shared hit=3715
-> Bitmap Heap Scan on test (cost=3895.99..263299.28 rows=847
width=0) (actual time=1.341..9.187 rows=3712 loops=1)
Recheck Cond: (tenant = $1)
Filter: (created_date = $2)
Buffers: shared hit=3715
-> Bitmap Index Scan on i1 (cost=0.00..3895.77 rows=169372
width=0) (actual time=0.756..0.756 rows=3712 loops=1)
Index Cond: (tenant = $1)
Buffers: shared hit=19
Total runtime: 11.580 ms
(10 rows)
If you wanted to recreate this scenario I created a simple script to
create the table:
create table test(tenant char(15), created_by char(15), created_date date);
insert into test values('x', 'y','2011-6-30');
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test; -- 256k rows
update test set tenant = lpad((random()*10000)::int::text,15,'0'),
created_by = lpad((random()*10000)::int::text,15,'0');
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test; -- 32m rows
create index i1 on test(tenant);
vacuum analyze;
I use JDBC to perform the updates and selects, but this will do too:
prepare x as select count(*) from test where tenant = $1 and
created_date = $2;
prepare y as update test set created_by = $1 where tenant = $2 and
created_date = $3;
execute y('000000000000001', '000000000000001','2011-6-30');
execute x('000000000000001','2011-6-30');
I'll probably compile Postgres with WAL debug on next, and try the page
inspect module.
On the face of it, though, this looks like Postgres would not be that
useful as database that resides (mostly) in the cache.
Here what the query mentioned here returns (note that shared_buffers is
very large, but I observed the same with smaller settings):
http://wiki.postgresql.org/wiki/Server_Configuration
name
| current_setting
------------------------------+------------------------------------------------------------------------------------------------------------------
version | PostgreSQL 9.1beta2 on
x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.4 20100726 (Red Hat
4.4.4-13), 64-bit
autovacuum | off
bgwriter_delay | 10ms
bgwriter_lru_maxpages | 1000
checkpoint_completion_target | 0.9
checkpoint_segments | 128
client_encoding | UTF8
effective_cache_size | 64GB
lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
log_checkpoints | on
log_line_prefix | %m
maintenance_work_mem | 2GB
max_connections | 100
max_stack_depth | 2MB
server_encoding | UTF8
shared_buffers | 20GB
TimeZone | UTC
wal_buffers | 16MB
work_mem | 1GB
(20 rows)
Thanks.
-- Lars
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance