Hi didier, thank you for your time. I forgot to display before the output of free. I’ve looked into it before and I found difficult to fully understand if there was something wrong. Before starting Postgres: total used free shared buffers cached Mem: 125G 9G 115G 15M 362M 8.1G -/+ buffers/cache: 1.5G 124G Swap: 127G 0B 127G Here’s an example of free output when queries B_1 and B_2 are running (they’re part of the same transaction). Generally values remains the same. For what I can understand, RAM isn’t used at all (there’s a lot of unused RAM). total used free shared buffers cached Mem: 125G 13G 112G 3.1G 362M 11G -/+ buffers/cache: 1.9G 123G Swap: 127G 0B 127G With Postgres running after transaction has been executed: total used free shared buffers cached Mem: 125G 13G 112G 3.1G 362M 11G -/+ buffers/cache: 1.5G 124G Swap: 127G 0B 127G there's also huge page It was enabled and after disabling it nothing changed: time execution is practically the same (131s for the same transaction tested in previous emails, which is composed by queries B_1 and B_2). Also test on the dell: Here’s the output for the two queries: select tmp.cf, tmp.dt from grep_studi.tmp; "Seq Scan on grep_studi.tmp (cost=0.00..11007.74 rows=1346868 width=72) (actual time=0.082..618.709 rows=2951191 loops=1)" " Output: cf, dt" " Buffers: shared hit=512 read=7802 dirtied=8314" "Planning time: 0.087 ms" "Execution time: 745.505 ms" select tmp.cf, tmp.dt from grep_studi.tmp; "Sort (cost=38431.55..39104.99 rows=1346868 width=72) (actual time=3146.548..3306.179 rows=2951191 loops=1)" " Output: cf, dt" " Sort Key: tmp.cf" " Sort Method: quicksort Memory: 328866kB" " Buffers: shared hit=8317" " -> Seq Scan on grep_studi.tmp (cost=0.00..11007.74 rows=1346868 width=72) (actual time=0.012..373.346 rows=2951191 loops=1)" " Output: cf, dt" " Buffers: shared hit=8314" "Planning time: 0.034 ms" "Execution time: 3459.065 ms" 32 GB for buffers is too high for the queries in your test but it I’ve set shared_buffers to be 1/4 of the total RAM. I’ve changed kernel values to accomodate this value. Lowering to smaller values doesn’t improve the transaction results. Here’s the results with 1 run for each level of shared_buffers: 32GB: 131s 16GB: 132s 8GB: 133s 4GB: 132s 2GB: 143s 1GB: 148s 512MB: 183s 256MB: 192s Probably I can keep 4GB but I make use of several partitions with tens of millions of records each. This is why I keep shared_buffers high. My applications is also similar to a DWH solution with one user. Like you said, big values of shared_buffers shouldn’t be a issue. I’ve done some tests with sysbench on Dell T420 and MacMini. T420 - RAM READ - 16GB / 1MB sh-4.3# sysbench --test=memory --memory-oper=read --memory-block-size=1MB --memory-total-size=16GB run sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Doing memory operations speed test Memory block size: 1024K Memory transfer size: 16384M Memory operations type: read Memory scope type: global Threads started! Done. Operations performed: 16384 (3643025.32 ops/sec) 16384.00 MB transferred (3643025.32 MB/sec) Test execution summary: total time: 0.0045s total number of events: 16384 total time taken by event execution: 0.0031 per-request statistics: min: 0.00ms avg: 0.00ms max: 0.02ms approx. 95 percentile: 0.00ms Threads fairness: events (avg/stddev): 16384.0000/0.00 execution time (avg/stddev): 0.0031/0.00 MacMini - RAM READ - 16GB / 1MB server:sysbench Pietro$ ./sysbench --test=memory --memory-oper=read --memory-block-size=1MB --memory-total-size=16GB run sysbench 0.5: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Random number generator seed is 0 and will be ignored Threads started! Operations performed: 16384 ( 5484.50 ops/sec) 16384.00 MB transferred (5484.50 MB/sec) General statistics: total time: 2.9873s total number of events: 16384 total time taken by event execution: 2.9836s response time: min: 0.18ms avg: 0.18ms max: 0.24ms approx. 95 percentile: 0.19ms Threads fairness: events (avg/stddev): 16384.0000/0.00 execution time (avg/stddev): 2.9836/0.00 T420 - RAM WRITE - 16GB / 1MB sh-4.3# sysbench --test=memory --memory-oper=write --memory-block-size=1MB --memory-total-size=16GB run sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Doing memory operations speed test Memory block size: 1024K Memory transfer size: 16384M Memory operations type: write Memory scope type: global Threads started! Done. Operations performed: 16384 ( 8298.97 ops/sec) 16384.00 MB transferred (8298.97 MB/sec) Test execution summary: total time: 1.9742s total number of events: 16384 total time taken by event execution: 1.9723 per-request statistics: min: 0.12ms avg: 0.12ms max: 0.25ms approx. 95 percentile: 0.12ms Threads fairness: events (avg/stddev): 16384.0000/0.00 execution time (avg/stddev): 1.9723/0.00 MacMini - RAM WRITE - 16GB / 1MB server:sysbench Pietro$ ./sysbench --test=memory --memory-oper=write --memory-block-size=1MB --memory-total-size=16GB run sysbench 0.5: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Random number generator seed is 0 and will be ignored Threads started! Operations performed: 16384 ( 5472.90 ops/sec) 16384.00 MB transferred (5472.90 MB/sec) General statistics: total time: 2.9937s total number of events: 16384 total time taken by event execution: 2.9890s response time: min: 0.18ms avg: 0.18ms max: 0.32ms approx. 95 percentile: 0.19ms Threads fairness: events (avg/stddev): 16384.0000/0.00 execution time (avg/stddev): 2.9890/0.00 T420 - CPU sh-4.3# sysbench --test=cpu run sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Doing CPU performance benchmark Threads started! Done. Maximum prime number checked in CPU test: 10000 Test execution summary: total time: 13.0683s total number of events: 10000 total time taken by event execution: 13.0674 per-request statistics: min: 1.30ms avg: 1.31ms max: 1.44ms approx. 95 percentile: 1.35ms Threads fairness: events (avg/stddev): 10000.0000/0.00 execution time (avg/stddev): 13.0674/0.00 MacMini - CPU server:sysbench Pietro$ ./sysbench --test=cpu run sysbench 0.5: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Random number generator seed is 0 and will be ignored Primer numbers limit: 10000 Threads started! General statistics: total time: 11.5728s total number of events: 10000 total time taken by event execution: 11.5703s response time: min: 1.15ms avg: 1.16ms max: 2.17ms approx. 95 percentile: 1.17ms Threads fairness: events (avg/stddev): 10000.0000/0.00 execution time (avg/stddev): 11.5703/0.00 I’ve done these tests because someone else on this discussion asked me to investigate on memory bandwidth and because I found this interesting article about Intel Xeon vs Intel i5 with different Postgres versions: http://blog.pgaddict.com/posts/performance-since-postgresql-7-4-to-9-4-pgbench Hope this helps to better understand the problem. Thank you very much. Best regards, Pietro |