On Wed, Jul 20, 2005 at 10:25:59AM +0300, Tsirkin Evgeny wrote: > trying to reduce query time and since that is a web application i now > using caching with perl MLDBM::Sync::SDBM_File on application level. > (Basically that involves quering all the table and putting it into > MLDBM::Sync::SDBM_File and then quering it back). What is interesting > is that while time quering the tables from postgres is taken 0.3 sec. > using sdbm it takes 0.1 sec. PostgreSQL isn't likely to win a speed contest against a trivial storage/retrieval library when the test involves simple queries and little or no concurrency. PostgreSQL is a full-featured database with goals beyond just "make SELECT statements as fast as possible." > Actually i am not anymore sure that postgres does NOT cache table in memory, > maybe it does and i don't know about it? As I mentioned in an earlier message, you could enable statistics gathering and use the statistics views to see whether your queries are being satistified from the buffer cache or if they require a call to the operating system: http://www.postgresql.org/docs/8.0/static/monitoring-stats.html The first time you query a table it will probably need to be fetched from disk (or from the operating system's cache); subsequent queries will likely be much faster because PostgreSQL will have stored the fetched pages in its buffer cache. Example: SELECT heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit FROM pg_statio_user_tables WHERE relname = 'foo'; heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit ----------------+---------------+---------------+-------------- 0 | 0 | 0 | 0 (1 row) EXPLAIN ANALYZE SELECT * FROM foo WHERE id = 12345; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Index Scan using foo_pkey on foo (cost=0.00..3.01 rows=1 width=18) (actual time=73.686..73.698 rows=1 loops=1) Index Cond: (id = 12345) Total runtime: 73.934 ms (3 rows) SELECT heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit FROM pg_statio_user_tables WHERE relname = 'foo'; heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit ----------------+---------------+---------------+-------------- 1 | 0 | 3 | 0 (1 row) EXPLAIN ANALYZE SELECT * FROM foo WHERE id = 12345; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Index Scan using foo_pkey on foo (cost=0.00..3.01 rows=1 width=18) (actual time=0.072..0.083 rows=1 loops=1) Index Cond: (id = 12345) Total runtime: 0.237 ms (3 rows) SELECT heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit FROM pg_statio_user_tables WHERE relname = 'foo'; heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit ----------------+---------------+---------------+-------------- 1 | 1 | 3 | 3 (1 row) Notice how much faster the second query was, even though it was identical to the first. As you can see from the statistics view, the second query's page fetches were all "hits," meaning they were retrieved from the buffer cache. If you want to analyze performance, then make sure your tests mimic the activity you expect to see on the production system, such as number of concurrent connections; amount of select, insert, update, and delete activity; similar data sets in terms of size and distribution; representative queries; etc. Use PostgreSQL's and the operating system's instrumentation to identify performance bottlenecks, and then look for remedies. BTW, pgsql-performance might be a more appropriate list for this thread. -- Michael Fuhr http://www.fuhr.org/~mfuhr/