On Tue, Sep 28, 2010 at 1:49 PM, Ivan Sergio Borgonovo <mail@xxxxxxxxxxxxxxx> wrote: > I know I'm comparing apples and orange but still the difference in > performance was quite astonishing. > > I've 2 tables that look like: > > create table products( > id bigint > price double precision, /* legacy, don't ask */ > sometextfield1 varchar(128), > sometextfield2 varchar(128), > ... > ); > > one on a MS SQL 2005 and another one on pg 8.3. > > MS SQL has full text search on the text fields (I don't know the > details). > > pg product table has a tsvector field and a gin index defined on it + > trigger that update the tsvector field when the textfields change. > The trigger is made in a way that it actually update the tsvector > just if the text fields are changed. > > The hardware on the 2 machines is a bit different. > MS SQL 2005 runs on a box that is 2 years old, 2 SATA drives on RAID > 1 hw, 2 Xeon dual core (I can't check details right now) > PG runs on a box that has more than 5 years, 3 SCSI drives on RAID 5 > hw, 2 Xeon single core HT (Intel(R) Xeon(TM) CPU 3.20GHz, cpu fam > 14, model 4) > > Both have 4Gb of ram. > shared_buffers is 240Mb. > Both share a similar workload. > Both boxes were in the same "price class" when they were bought. > > In both tables I've to update price. > VACUUM FULL was performed just before updating the prices. > > MS SQL receives a large sql file that contain all the UPDATE > statements. > PG receive a csv file that is loaded into a table with COPY and then > does the update as > update products set price=p.price from temp_price where id=p.id and > price<>p.price; > > MS SQL ingurgitate the whole sql file in around 10sec. > pg takes more than 5 min to just run the single update statement. > > I'd like to know if such a large difference can be justified just by > HW difference or by a difference in the process on how data are > loaded [1] or by the difference in performance of the 2 servers on > this kind of workload or by some postgres config before I decide how > to manage my time to redesign the import procedure. > > If HW can justify such huge difference I'll devote my time to other > problems. > > I'd say that a potential culprit could be the gin index. No matter > if the tsvector is updated or not, if the row is changed I think the > index is going to be updated anyway. > Somehow MS SQL circumvent this problem, possibly by building the > equivalent of a tsvector column in a "hidden" table that > automatically join to the "text" table. > This add a join but reduce the cost of table modification since > simpler (btree) indexes are faster to update. > > Still huge updates are rare and that table is mostly read and very > rarely written. During unusually huge updates I may consider to drop > the gin index. > > [1] I'd expect that excluding the time it takes to load the csv a > single update should run faster than a huge list of single statement > update > > -- > Ivan Sergio Borgonovo > http://www.webthatworks.it > > > -- > Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > Restarting the computer on which PG is running may help. I have access to a server running PG 8.4 on Ubuntu and I have noticed that after a day of intense use the PG slows down significantly, "free -g" reports almost no free memory available (something seems to leak memory on this Ubuntu box). But when I restart the OS (Ubuntu), the PG executes my queries in good time. I seem not to have similar problems on the other servers running Fedora 12 and 13. But it could be my configuration(s) on the Ubuntu box at fault, I am still investigating. Allan. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general