Okay guys, Thanks for all the great help and advice already! Let me just clear some things, to make my question a little easier to answer :-) Now my site is a search engine for used cars - not just a car shop with a few hundred cars. The update query you look at, is an update that is executed once a day in chunks for all active adverts, so we know they are still for sale (one car can be advertised at several places hence several "adverts"). So it's not a "constant stream" but it has a fairly high volume especially at night time though. A compressed version of my .conf looks like this (note: there is some tweaks at the end of the file) data_directory = '/var/lib/postgresql/9.2/main' hba_file = '/etc/postgresql/9.2/main/pg_hba.conf' ident_file = '/etc/postgresql/9.2/main/pg_ident.conf' external_pid_file = '/var/run/postgresql/9.2-main.pid' listen_addresses = '192.168.0.2, localhost' port = 5432 max_connections = 1000 unix_socket_directory = '/var/run/postgresql' wal_level = hot_standby synchronous_commit = off archive_mode = _onarchive_command_ = 'rsync -a %p postgres@192.168.0.4:/var/lib/postgresql/9.2/wals/%f </dev/null' max_wal_senders = 1 wal_keep_segments = 32 logging_collector = on log_min_messages = debug1 log_min_error_statement = debug1 log_min_duration_statement = 0 log_checkpoints = on log_connections = on log_disconnections = _onlog_line_prefix_ = '%t [%p]: [%l-1] user=%u,db=%d ' log_lock_waits = on log_temp_files = 0 datestyle = 'iso, mdy' lc_messages = 'C' lc_monetary = 'en_US.UTF-8' lc_numeric = 'en_US.UTF-8' lc_time = 'en_US.UTF-8' default_text_search_config = 'pg_catalog.english' default_statistics_target = 100 maintenance_work_mem = 1GB checkpoint_completion_target = 0.7 effective_cache_size = 22GB work_mem = 160MB wal_buffers = 4MB checkpoint_segments = 16 shared_buffers = 7680MB # All the log stuff is mainly temporary requirement for pgBadger # The database has been tuned with pgtuner You might be familiar with new relic, and I use that for quite a lot of monitoring. So, this is what I see at night time (a lot of I/O). So I went to play around with pgBadger to get some insights at database level. <iframe src=""https://rpm.newrelic.com/public/charts/h2dtedghfsv">https://rpm.newrelic.com/public/charts/h2dtedghfsv" width="500" height="300" scrolling="no" frameborder="no"></iframe> This shows me, that the by far most time-consuming queries are updates (in general). On avg. a query like the one I showed you, take 1,3 sec (but often it takes several minutes - which makes me wonder). So correct me if I'm wrong here: my theory is, that I have too many too slow update queries, that then often end up in a situation, where they "wait" for each other to finish, hence the sometimes VERY long execution times. So my basic idea here is, that if I could reduce the cost of the updates, then I could get a hight throughput overall. Here is a sample of the pgBadger analysis: Queries that took up the most time (N) ^ Rank Total duration Times executed Av. duration (s) Query 1 1d15h28m38.71s 948,711 0.15s COMMIT; 2 1d2h17m55.43s 401,002 0.24s INSERT INTO "car_images" ( "car_id", "created_at", "image", "updated_at" ) VALUES ( '', '', '', '' ) returning "id"; 3 23h18m33.68s 195,093 0.43s SELECT DISTINCT "cars".id FROM "cars" LEFT OUTER JOIN "adverts" ON "adverts"."car_id" = "cars"."id" LEFT OUTERJOIN "sellers" ON "sellers"."id" = "adverts"."seller_id" WHERE "cars"."sales_state" = '' AND "cars"."year" = 0 AND"cars"."engine_size" = 0.0 AND ( ( "cars"."id" IS NOT NULL AND "cars"."brand" = '' AND "cars"."model_name" = ''AND "cars"."fuel" = '' AND "cars"."km" = 0 AND "cars"."price" = 0 AND "sellers"."kind" = '' ) ) LIMIT 0; 4 22h45m26.52s 3,374,133 0.02s SELECT "adverts".* FROM "adverts" WHERE ( source_name = '' AND md5 ( url ) = md5 ( '' ) ) LIMIT 0; 5 10h31m37.18s 29,671 1.28s UPDATE "adverts" SET "last_observed_at" = '', "data_source_id" = '' WHERE "adverts"."id" IN ( ... ) ; 6 7h18m40.65s 396,393 0.07s UPDATE "cars" SET "updated_at" = '' WHERE "cars"."id" = 0; 7 7h6m7.87s 241,294 0.11s UPDATE "cars" SET "images_count" = COALESCE ( "images_count", 0 ) + 0 WHERE "cars"."id" = 0; 8 6h56m11.78s 84,571 0.30s INSERT INTO "failed_adverts" ( "active_record_object_class", "advert_candidate", "created_at", "exception_class","exception_message", "from_rescraper", "last_retried_at", "retry_count", "source_name", "stack_trace","updated_at", "url" ) VALUES ( NULL, '', '', '', '', NULL, NULL, '', '', '', '', '' ) returning "id"; 9 5h47m25.45s 188,402 0.11s INSERT INTO "adverts" ( "availability_state", "car_id", "created_at", "data_source_id", "deactivated_at","first_extraction", "last_observed_at", "price", "seller_id", "source_id", "source_name", "updated_at", "url" )VALUES ( '', '', '', '', NULL, '', '', '', '', '', '', '', '' ) returning "id"; 10 3h4m26.86s 166,235 0.07s UPDATE "adverts" SET "deactivated_at" = '', "availability_state" = '', "updated_at" = '' WHERE "adverts"."id" = 0; (Yes I'm already on the task of improving the selects) |