Re: Disk Performance Problem on Large DB

Kenneth Marshall <ktm@xxxxxxxx> · Fri, 5 Nov 2010 08:17:38 -0500

Correct, with a single drive and no write cache, once you get more
than 1 I/O running simultaneously, i.e. 1 writing the data and 1
writing each index = 5 I/Os at once, you effectively devolve to
your drives random I/O rate which can be an order of magnitude
slower than its sequential I/O rate. You can use bonnie or some
other disk speed test to get those numbers for your system. When
you do the indexes after the load, each step can use sequential
I/O much more of the time which is why it runs so much faster.

Cheers,
Ken

On Fri, Nov 05, 2010 at 02:19:41PM +1030, Samuel Stearns wrote:
> Your biggest benefit was dropping the indexes before the load, most likely.
> 
> -----Original Message-----
> From: Jonathan Hoover [mailto:jhoover@xxxxxxxxxxxxx] 
> Sent: Friday, 5 November 2010 2:16 PM
> To: Samuel Stearns; pgsql-admin@xxxxxxxxxxxxxx
> Cc: Kenneth Marshall
> Subject: RE:  Disk Performance Problem on Large DB
> 
> I am in the middle of re-creating the indexes now, and what is interesting is how atop is not reporting heavy use of the hard drive now. Instead, I see postgres using 80% of the proc (instead of 8% earlier) and drive usage is 20+ MBr/s and 16+ MBw/s now (instead of .1 and 3.0 respectively earlier). Could it really be the PK causing the delay, or is it really the maintenance_work_mem or simply the idea of creating the indexing after? Good info, hopefully I can do some testing over these ideas over the next few days. For now, I'm hoping I can just get things moving enough.
> 
> As I finished this up, I have noticed disk performance is down to 4+ MBw/s and MBr/s, but it is not "red" in atop any longer, and proc usage now seems to be the limiting factor.
> 
> Good stuff...
> Jon
> 
> -----Original Message-----
> From: pgsql-admin-owner@xxxxxxxxxxxxxx [mailto:pgsql-admin-owner@xxxxxxxxxxxxxx] On Behalf Of Jonathan Hoover
> Sent: Thursday, November 04, 2010 10:29 PM
> To: Samuel Stearns; pgsql-admin@xxxxxxxxxxxxxx
> Cc: Kenneth Marshall
> Subject: Re:  Disk Performance Problem on Large DB
> 
> How does TRUNCATE differ from DELETE FROM <table>? Sorry, probably an easy RTFM question, but I'll ask anyhow.
> 
> -----Original Message-----
> From: Samuel Stearns [mailto:SStearns@xxxxxxxxxxxxxxxx] 
> Sent: Thursday, November 04, 2010 10:27 PM
> To: Jonathan Hoover; pgsql-admin@xxxxxxxxxxxxxx
> Cc: Kenneth Marshall
> Subject: RE:  Disk Performance Problem on Large DB
> 
> TRUNCATE removes all data from the table leaving the schema structure in place.
> 
> What helped the most was probably the drop of the indexes.
> 
> -----Original Message-----
> From: pgsql-admin-owner@xxxxxxxxxxxxxx [mailto:pgsql-admin-owner@xxxxxxxxxxxxxx] On Behalf Of Jonathan Hoover
> Sent: Friday, 5 November 2010 1:53 PM
> To: pgsql-admin@xxxxxxxxxxxxxx
> Cc: Kenneth Marshall
> Subject: Re:  Disk Performance Problem on Large DB
> 
> Just FYI, I removed the PK and the indexes for now. Load times for 1M rows is now 7 SECONDS instead of 7 MINUTES (or just never happening). Granted, I made the changes in #1 below, but WOW! So, question: what helped the most: 1) no PK, 2) no indexes, 3) the maintenance_work_mem being uncommented? I will test myself when I have time, but I'd like to know everyone's thoughts.
> 
> Jon
> 
> -----Original Message-----
> From: pgsql-admin-owner@xxxxxxxxxxxxxx [mailto:pgsql-admin-owner@xxxxxxxxxxxxxx] On Behalf Of Jonathan Hoover
> Sent: Thursday, November 04, 2010 10:03 PM
> To: pgsql-admin@xxxxxxxxxxxxxx
> Cc: Kenneth Marshall
> Subject: Re:  Disk Performance Problem on Large DB
> 
> 1. I have now set maintenance_work_mem to 256 MB (which was previously commented by the default config)
> 2. The version is PostgreSQL 8.1.18 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46)
> 3. What would be the best and cheapest thing I could for IO performance?
> 4. I need to read up on TRUNCATE, which I have not used before. Care to give a quick overview before I RTFM?
> 
> Thanks,
> jon
> 
> -----Original Message-----
> From: Kenneth Marshall [mailto:ktm@xxxxxxxx] 
> Sent: Thursday, November 04, 2010 4:03 PM
> To: Jonathan Hoover
> Cc: pgsql-admin@xxxxxxxxxxxxxx
> Subject: Re:  Disk Performance Problem on Large DB
> 
> On Thu, Nov 04, 2010 at 01:42:49PM -0700, Jonathan  Hoover wrote:
> > Hello,
> > 
> > I have a RHEL 5 box, 4 GB RAM, single hard drive SATA, Intel Core 2 Duo 2.4. A basic workstation.
> > 
> > I have a simple database, with one table for now. It has 4 columns:
> > 
> > anid serial primary key unique,
> > time timestamp,
> > source varchar(5),
> > unitid varchar(15),
> > guid varchar(32)
> > 
> > There is a btree index on each.
> > 
> > I am loading data 1,000,000 (1M) rows at a time using psql and a COPY command. Once I hit 2M rows, my performance just drops out, and the next 1M never finishes. It takes 7 minutes for 1M rows to load. Once 2M are in there, I've waited an hour, and nothing. It doesn't seem to matter which 1M rows I try to load next, none ever finish. Each 1M rows is about 70MB on disk in the raw input file.
> > 
> > I have "atop" installed, and it reports the drives at 100%, which it reports for the first 1M rows too. The MBw/s goes from 20+ on the first 2M rows, down to about 4 MBw/s or less now. The processor usage is at about 2 to 8% at this time (used by postgres).
> > 
> > I have even waited for 1M rows to load, then done a vacuum for no good reason, then even restarted postgresql. I've made sure no disk or proc activity is happening before I start the next 1M rows. None of that seems to matter.
> > 
> > I have a total of about 70M rows to load, but am at a standstill. I've read up on whatever performance docs I can find online, but I am not getting anywhere.
> > 
> > I've increased shared_buffers to 256MB, and I've tried it with fsync commented out as per the default config. I've also tried it with fsync=off. No difference.
> > 
> > Ideas? Thanks in advance,
> > Jon
> 
> The initial 1M load if the table has just been truncated or created
> has no WAL logging. You can boost maintenance_work_mem to increase
> index creation/update performance. You are severely I/O limited and
> would be better off dropping your indexes during the load and re-
> creating them afterwards. If you are starting with an empty table,
> truncate it and then load all the data in a single transaction, all
> 7 COPY commands. Then COMMIT and build the indexes. Your question
> is also missing key information like config details, PostgreSQL version,
> ...
> 
> Cheers,
> Ken
> 
> -- 
> Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
> 
> -- 
> Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
> 
> -- 
> Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
> 

-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin