On Fri, Nov 21, 2008 at 7:45 PM, Greg Smith <gsmith@xxxxxxxxxxxxx> wrote: > On Fri, 21 Nov 2008, Tom Lane wrote: > >> Not sure if it applies to your real use-case, but if you can try doing >> the COPY from a local file instead of across the network link, it >> might go faster. > > The fact that the inserts are reported as fast initially but slow as the > table and index size grow means it's probably a disk bottleneck rather than > anything related to the client itself. If the network overhead was the > issue, I wouldn't expect it to start fast like that. Ditto for concerns > about the random function being slow. Either of those might speed up the > initial, fast period a bit, but they're not going to impact the later > slowdown. > > Ciprian, the main interesting piece of data to collect here is a snapshot of > a few samples lines from the output from "vmstat 1" during the initial, fast > loading section versus the slower period. I think what's happening to you > is that maintaining the index blocks on the disk is turning into > increasingly random I/O as the size of the table grows, and your disks just > can't keep up with that. What I'd expect is that initially the waiting for > I/O "wa" figure will be low, but it will creep up constantly and at some > point spike up hard after the working set of data operated on exceeds > memory. Ok. Silly question: how do I "vmstat 1"??? The problem is indeed the indexes... So If we analyze the insert patterns: client id's randomly distributed and sensor id's the same, and the index is created ontop of these two, it means that (probabilistically) speaking after 100 thousand inserts (10 thousand clients and 10 sensors), all the index pages would be dirty... Indeed I could prolongue the flush by using bigger and bigger memory, but this doesn't help for 100 million records... > The fact that PostgreSQL performs badly here compared to the more > lightweight databases you've used isn't that much of a surprise. There's a > fair amount of overhead for the write-ahead log and the MVCC implementation > in the database, and your application is suffering from all that robustness > overhead but not really gaining much of a benefit from it. The main things > that help in this sort of situation are increases in shared_buffers and > checkpoint_segments, so that more database information is stored in RAM for > longer rather than being pushed to disk too quickly, but what Rafael > suggested already got you most of the possible improvement here. You might > get an extra bit of boost by adjusting the index FILLFACTOR upwards (default > is 90, if you're never updating you could try 100). I doubt that will be > anything but a minor incremental improvement though. About the fillfactor, on the contrary, I think I should set it lower (as the index fills very quickly)... I've set it to 10% and it behave a little better than with (10)... I'll run a full 100 million test to see where it breaks... > -- > * Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD Thanks, Ciprian Craciun. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general