Re: Linux vs FreeBSD

Vick Khera <vivek@xxxxxxxxx> · Tue, 8 Apr 2014 09:44:15 -0400

On Fri, Apr 4, 2014 at 12:03 AM, François Beausoleil
<francois@xxxxxxxxxxx> wrote:
> Our workload is lots of data import, followed by many queries to summarize (daily and weekly reports). Our main table is a wide table that represents Twitter and Facebook interactions. Most of our reports work on a week's worth of data (table is partitioned by week), and the tables are approximately 25 GB plus 5 GB of indices, per week. Of course, while reports are ongoing, we're also importing next week's data.
>
> The host is a dedicated hardware machine at online.fr
>
> : 128 GB RAM, 2 x 3TB disk in RAID 1 configuration.

I use FreeBSD pretty much exclusively. I would recommend using it to
anyone, but as others have said, there is a learning curve. If you are
comfortable on the command line, the curve is not that great. From a
postgres perspective, there is not that much difference once you dig
thru google to find a good set of OS settings (I'm happy to share
mine.)

That all said, I think you should do two things: First, improve your
disk system. Is your RAID soft or hardware? What file system do you
run on it? You want the file system to be the fastest you can have
that has the features you need. Pg is pretty good about crash
recovery, so having a log-based file system is not absolutely
necessary to save you there, but it depends on how much downtime you
can take for fsck to run. Since you're on spinning platters, you want
to peel off your pg_xlog directory to another set of drives in mirror
configuration. I personally like SSDs for this.  You probably also do
not need the lvm layer here either. You imply that you are analyzing
one week while loading the next week. Perhaps set up a second disk
mirror (instead of RAID10 all of the disks) and use a separate
postgres table space to load the data on the "other" mirror from the
one you are currently analyzing, like odd/even week numbers.

The second item is that you said you are using "wide tables". You will
be amazed at how much better you could do by properly normalizing your
data as you import it, rather than what seems like just storing log
events and picking out what you need for your summaries. Postgres is
exceptionally good at optimizing queries with joins and summaries. I
would put forth some effort simplifying how you store the data to
match more closely with the answers you want from it.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general