Re: Backing up postgresql database

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 3, 2009 at 5:51 PM, Jakov Sosic <jakov.sosic@xxxxxxx> wrote:
> Scott Marlowe wrote:
>
>> That might not be much smaller if you're doing a lot of updates /
>> inserts / deletes.  Also, SQL ain't the same as what was committed. If
>> you insert now() it won't be the same when you run it in two days.
>
> Well I think I miscalculated a little bit... It's more like a 120 GB per
> day, not 5TB... I'll post correct results tommorow.

Oh, that'll make a big big difference.  That's like 1/40th the amount
of data you thought you had.   At that rate you can keep a weeks data
on a 1TB drive.

>> The key here is it seems like you have a very high churn rate, and if
>> you're backup solution involves saving all those changes and applying
>> them later, it's gonna incur a rather large storage use.
>>
>> I.e. PITR, as you have it implemented, might not be your best solution.
>
>> You might be better off with something like Slony, Burcado (sp?) or
>> another of other live replication setups that don't have the same
>> issues you're seeing here.
>
> Replication involves another machine, and more load on current one, and
> that's not what I want.

What you're doing right now is replication, so I'd expect it involve
another machine, as if this one catches fire or the ps burns out all
the hard drives or something, you'll need the LOGs on another machine
to ensure you can bring your db back up.  As long as the replication
is asynchronous and recovers well from falling behind the secondary
machine need not be a monster.  I would assume you're db usage has
some kind of pattern that is lower at some point than at others, but
if it's always inserting 24/7 at high rates then yeah, you'd need a
secondary machine as fast as your primary to keep up.

>> It's probably better to keep pg_dump type backups for those types of
>> things.  pitr is a poor challenger to the easily compressed output of
>> pg_dump for offsite backups.  Especially over a wire.
>
> But how long would pg_dump execute on DB with this volume of traffic?
> And would it kill the machine in that period? I can try that as well
> tommorow...

We run pg_dump during lower volume hours, but it generally doesn't
kill the machine.  OTOH, there's only so much IO in a server, and if
you're using nearly all of yours then pg_dump can easily make the
machine bog down.  Always dump to a separate array (and machine) from
the one holding the DB if you can.

Note you can also use a recovered PITR machine to make a backup.  If
you had them alternating you could take nightly backups and not affect
your primary db.  But that depends on how long your rsyncs take.
Given a fast RAID array on the db server, and gigE, you should be able
to run a backup off hour in the several hundreds of megs a second
range.

>> If I update a single row in a database 1,000,000,000 times, how man
>> WAL files will you need?
>> If you update every row in a 1,000,000,000 row database, how many WAL
>> files will you need?
>
> Obviously application is not just appending new data, as I thought
> before... Because appends (aka INSERT) wouldn't cause this volume of
> WAL's or would it?

Sure it would, but it would also cause your main db on disk storage to
grow as well.  In pgsql update = delete / insert.

>> I'm wondering what you're objectives are in your backup strategy, it
>> might be you're coming at it one way when we'd all approach it from
>> another way.  Or not.
>
> You are correct on this one. DB I'm talking about is not that important
> to have rigorous backup schedule and recovery policies. I rather have
> hardware limitations and I have to find the best solution to not
> overutilize the hardware and in the same time have as frequent backups
> as possible.

Well, one of the objectives of PITR is to have as close to up to the
second replication as possible with minimal overhead.  After all, the
WAL files have to be created anyway, so we may as well use them.
Understand on the hardware.  It's one thing to be asked to stretch the
hardware budget, but it can only stretch so far.  Make sure what your
objectives are in the project, and see if they're realistic in light
of your hardware restrictions.  You say the DB isn't important, but
how expensive would the data on it be to replace if you had not
backups, or week old backups, or month or backups etc.  Data has a
value, and if you save $2,000 on a server and lose $25,000 worth of
data, someone somewhere made a bad choice.

-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux