> Would it be possible to change the primary key to > (logtime,key)? This could help keeping the "working window" small. No, the application accessing the data wants all the rows between start and end time for a particular key value. > > Secondly, the real working set is smaller, as the rows are > all inserted at the end of the table, filling each page until > it's full, so only the last pages are accessed. There's no > relation between the index order, and the order of data on > disk, unless you CLUSTER. I'd theorizing that my problem is in updating the index itself and not the heap. Insert order Refers to the order by which the applications are inserting the rows and as such, the order by Which the index is being updated. This in turn, is causing the b-tree to be traverse. Problem Is the working set of blocks at the bottom of the btree is too big for my cache. > > > Any comment on other affects or gotchas with lowering the size of > > BLKSZ? Currently, our database is thrashing its cache of blocks we > > we're getting only ~100 inserts/second, every insert results in a > > evict-read-modify operation. > > I'm not shure that's the correct diagnosis. > > Do you have one transaction per insert? Every transaction > means a forced sync to the disk, so you won't get more than > about 100-200 commits per second, depending on your actual > disk rotation speed. No, an insert consists of roughly 10,000+ rows per transaction block. > > To improve concurrency of the "numer of inserters" running in > parallel, try to tweak the config variables commit_delay and > commit_sibling, so you get a higher overall throughput at > cost of an increased delay per connection, and increase the > number of inserters. Using sensible tweaking, the throughput > should scale nearly linear with the number of backens. :-) I don't think this will help us here due to large transactions already. > > If feasible for your application, you can also bundle several > log entries into a single transaction. If you're CPU bound, > you can use COPY instead of INSERT or (if you can wait for > 8.2) the new multi-row INSERT to further improve performance, > but I doubt that you're CPU bound. > > The only way to "really" get over the sync limit is to have > (at least) the WAL on a battery backed ram / SSD media that > has no "spinning disk" > physical limit, or abandon crash safety by turning fsync off. Again, problem is not with WAL writing, already on it's own raid1 disk pair. The I/O pattern we see is about 1-2% load on WAL and 100% load on the array holding the indexes and tables. Throughput is very low, something like 150k-200K bytes/second of real rows being deposited on the disk. The disks are busy seeking all over the disk platter to fetch a block, add a single row, then seek to another spot and write back a previously dirty buffer.... > > Thanks, > Markus. > -- > Markus Schaber | Logical Tracking&Tracing International AG > Dipl. Inf. | Software Development GIS > > Fight against software patents in Europe! www.ffii.org > www.nosoftwarepatents.org >