Search Postgresql Archives

huge difference in performance between MS SQL and pg 8.3 on UPDATE with full text search

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I know I'm comparing apples and orange but still the difference in
performance was quite astonishing.

I've 2 tables that look like:

create table products(
  id bigint 
  price double precision, /* legacy, don't ask */
  sometextfield1 varchar(128),
  sometextfield2 varchar(128),
  ...
);

one on a MS SQL 2005 and another one on pg 8.3.

MS SQL has full text search on the text fields (I don't know the
details).

pg product table has a tsvector field and a gin index defined on it +
trigger that update the tsvector field when the textfields change.
The trigger is made in a way that it actually update the tsvector
just if the text fields are changed.

The hardware on the 2 machines is a bit different.
MS SQL 2005 runs on a box that is 2 years old, 2 SATA drives on RAID
1 hw, 2 Xeon dual core (I can't check details right now)
PG runs on a box that has more than 5 years, 3 SCSI drives on RAID 5
hw, 2 Xeon single core HT (Intel(R) Xeon(TM) CPU 3.20GHz, cpu fam
14, model 4)

Both have 4Gb of ram.
shared_buffers is 240Mb.
Both share a similar workload.
Both boxes were in the same "price class" when they were bought.

In both tables I've to update price.
VACUUM FULL was performed just before updating the prices.

MS SQL receives a large sql file that contain all the UPDATE
statements.
PG receive a csv file that is loaded into a table with COPY and then
does the update as
update products set price=p.price from temp_price where id=p.id and
price<>p.price;

MS SQL ingurgitate the whole sql file in around 10sec.
pg takes more than 5 min to just run the single update statement.

I'd like to know if such a large difference can be justified just by
HW difference or by a difference in the process on how data are
loaded [1] or by the difference in performance of the 2 servers on
this kind of workload or by some postgres config before I decide how
to manage my time to redesign the import procedure.

If HW can justify such huge difference I'll devote my time to other
problems.

I'd say that a potential culprit could be the gin index. No matter
if the tsvector is updated or not, if the row is changed I think the
index is going to be updated anyway.
Somehow MS SQL circumvent this problem, possibly by building the
equivalent of a tsvector column in a "hidden" table that
automatically join to the "text" table.
This add a join but reduce the cost of table modification since
simpler (btree) indexes are faster to update.

Still huge updates are rare and that table is mostly read and very
rarely written. During unusually huge updates I may consider to drop
the gin index.

[1] I'd expect that excluding the time it takes to load the csv a
single update should run faster than a huge list of single statement
update

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux