Added to TODO: * Consider sorting entries before inserting into btree index http://archives.postgresql.org/pgsql-general/2008-01/msg01010.php --------------------------------------------------------------------------- Tom Lane wrote: > Clodoaldo <clodoaldo.pinto.neto@xxxxxxxxx> writes: > > 2008/1/16, Tom Lane <tgl@xxxxxxxxxxxxx>: > >> I don't know of any reason to think that insertion is slower in 8.3 > >> than it was in 8.2, and no one else has reported anything of the sort. > > > The old server reproduces the behavior of the new one. > > Okay, Clodoaldo kindly gave me access to his old server, and after > nearly a full day of poking at it I think I've figured out what is going > on. Recall that the problem query is > > insert into usuarios ( > data, > usuario, > pontos, > wus > ) > select > (select data_serial from data_serial) as data, > ui.usuario_serial as usuario, > sum(pontos) as pontos, > sum(wus) as wus > from usuarios_temp as ut inner join usuarios_indice as ui > on ut.usuario = ui.usuario_nome and ut.n_time = ui.n_time > group by data, ui.usuario_serial > ; > > for which both 8.2 and 8.3 select a plan along the lines of > > Subquery Scan "*SELECT*" (cost=318139.26..342283.02 rows=877955 width=20) > -> HashAggregate (cost=318139.26..331308.58 rows=877955 width=12) > InitPlan > -> Seq Scan on data_serial (cost=0.00..1.01 rows=1 width=4) > -> Merge Join (cost=101944.33..261142.53 rows=5699572 width=12) > Merge Cond: ((ut.n_time = ui.n_time) AND ((ut.usuario)::text = "inner"."?column4?")) > -> Index Scan using usuarios_temp_ndx on usuarios_temp ut (cost=0.00..55038.92 rows=883729 width=26) > -> Sort (cost=101944.33..104139.22 rows=877955 width=22) > Sort Key: ui.n_time, (ui.usuario_nome)::text > -> Seq Scan on usuarios_indice ui (cost=0.00..15273.55 rows=877955 width=22) > > and the target table has non-unique indexes on "data" and "usuario" > (which are both simple integer columns, no surprises there). > > I first tried running this query with "usuarios" initially empty, and > both 8.2 and 8.3 did fine. However, in the real scenario that Clodoaldo > is worried about, there's somewhere north of 135 million entries in > "usuarios" to begin with, and in that case 8.3 falls off a cliff while > 8.2 doesn't --- the INSERT query takes about 400 seconds in 8.2 while > I gave up at 2h20m with 8.3. Why is that? Well, it turns out that 8.3 > is thrashing a lot harder than 8.2 is. Each index on "usuarios" is > about 2.3GB (the same in both versions) and the server has only 2GB RAM, > so it's not real surprising that the working set might be more than RAM; > but why is 8.3 worse than 8.2? > > You can see from the query that it's inserting the same constant "data" > value into every new row, and if I understand the context correctly this > value will be higher than all prior entries in the "usuarios" table. So > all the new entries in the "data" index are at the right-hand edge of > the index, and only a fairly small footprint is being touched at any > instant. strace'ing confirms that neither 8.2 nor 8.3 do much I/O at > all on that index. It's the index on the "usuario" column that is > thrashing. > > The difference comes from the fact that the HashAggregate step --- which > is being done on hashing columns (data, usuario) --- is effectively > reading out in hash-value order for usuario, meaning that that is the > order in which we make index insertions. 8.2 had an extremely chintzy > hash function for integers --- basically just return the negative of the > integer value --- while 8.3 takes it seriously and produces a nicely > randomized hash value. This means that the usuario values are returned > in a relatively well ordered fashion in 8.2 and a nearly totally random > one in 8.3. I captured the output of the SELECT in both 8.2 and 8.3; > attached are plots showing the usuario values against row number. From > this we can see that 8.2 has a working set that is a relatively small > part of the index at any instant, whereas 8.3 has the entire index as > working set ... and it doesn't fit into RAM. Ergo, lots of disk seek > delays. > > I don't think we want to back off the improved hashing functions in 8.3 > --- in most scenarios they should lead to significantly better > performance. But in this particular context they hurt. > > A possibly usable workaround for now is "set enable_hashagg = off" > to force a GroupAggregate plan, which will deliver the values sorted > by (data, usuario) rather than by their hash values. This helps both > versions, bringing the runtime down to something like 250 seconds, > because the index on usuario then has complete locality of access. > Alternatively, doubling the server's RAM would probably make the problem > go away (for awhile, until the index reaches 4GB). > > In the long run, for queries inserting many rows it might be interesting > to accumulate all the entries intended for a btree index and sort them > before inserting. Not sure about possible downsides of that. > > regards, tom lane > Content-Description: usuario82.png [ image/png is not supported, skipping... ] Content-Description: usuario83.png [ image/png is not supported, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org/ -- Bruce Momjian <bruce@xxxxxxxxxx> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general