Re: Archiving A Database

Peter Beckman <beckman@purplecow.com> · Thu, 7 Nov 2002 00:04:41 -0500 (EST)

Do you need to archive it forever?  Do you need live access to it?

If you just need to archive it, read the man page for mysqldump.  You can
dump specific rows to a file, rename that file dump.2002.05.24, compress it
with gzip, and voila, you have all of your data readily available, easily
reinserted into your existing live DB.  After the dump is complete, just
delete those rows out of the DB.

Now you have to be sure that the file contains 100% of the data and is
accurate.  You'll have to figure out how to do that yourself!  One way I
did it with WWW log files was to make a tmp copy, run a program on it, see
if the output had the same number of lines in it, and if it did, delete the
tmp copy and replace the original copy with the new copy (IP->DNS
translation).

In this case you could write a script that parses the newly created dump
file in the standard format and, heck, using perl or PHP parse the thing
for ),( and see how many you get.  If the mysql select gets X rows, and the
"),(" or ");" occurs X-1 times, then you know you got all the rows.  (OK,
it will be more complex than this, but you get the idea.)

One record per file could get ugly real quick, both on the filesystem side
and the maintenance side.  Doing it by day will allow you to repopulate an
entire day back into the DB as needed for billing questions or what not.
MySQL will keep the unique ID (assuming it is a number auto_incremented)
unique and only use ever higher IDs, not use old ones that no longer exist
in the DB at the moment.  Once you are done with that Day (or days) of
data, you just delete it from the DB, knowing you have a copy of that day
on file.

I would also advise you to have 2-4 copies of all your backups on both
physically different computers as well as physically different locations.
I kept a copy of my live DB, backed up every 6 hours, at the data center on
other computer, on my computer file server at home, and another one 7
states over.  Sure, it's a bit of a pain in the ass, but if the data center
goes up in smoke and my file server is hacked on the same day, I still have
a copy 7 states over.  I just used scp (secure-shell copy) and/or rsync +
ssh to keep copies current and up to date on remote machines.

Peter

On Tue, 29 Oct 2002, Josh Johnson wrote:

> Does anyone know of speedy methods of periodically archiving and
> retrieving a database table? I've got a table I'm using to log
> statistical data, and it's getting 10k+ entries per day. Entries are
> only logged in the database if the user is authenticated for access, and
> only once per access session (the user is authenticated, a cookie is
> set, and the entry is logged. From that point the user is authenticated
> by verifying the cookie, which expires when the user's browser closes)
>
> The data is critical (used for billing and such), but the log viewing
> script I've written runs slower and slower with each passing day, and at
> one point the entire table was corrupted due to neglect on our system
> admin's part (I warned them about the default setting of
> mysql_max_connections being too low for a site that relied on the DB so
> heavily and had so much traffic.). The script is a simple front end to
> several MySQL queries, that retrieve totals for the past hour, day,
> week, month and grand totals for all clients or a given client. The rest
> of the site has been unaffected by the log issues.
>
> My current approach is to log the data as XML, one record per file, and
> create a log file that is an index of refrences to each record. I then
> empty previous data n the database. This works, but is very hard/slow to
> retrieve specific groups of information, and dumping the database takes
> a long time as well, which adds the difficulty of ensuring no records
> are lost. I also haven't yet tackled the issue of combining data from
> the XML seamlessly with live data from the database.
>
> I'm opening the floor to other potential solutions, or war stories from
> anyone else who's been in this situation before. If there's another
> approach aside from writing individual flat files, please let me know. I
> haven't tested it yet, but I think maybe logging all entries/data for a
> given log period into one file would speed things up, but I worry about
> the 40 megabyte files that could be potentially created. I'm also open
> to complete alternatives, since the main use for this data is to
> generate total accesses per a given factor (client, ip, referrer, etc).
>
> Any help is greatly appreciated. I can give any information needed, just
> ask!
>
> Thanks,
> -- Josh
>
> P.S. I did suggest that this issue be tackled when I first installed the
> system that creates the log, but management didn't think it was worth
> bothering with at the time.
>
> P.P.S. I did look through the archives before I posted, and didn't find
> anything. Please flame in private if I missed something!
>
>
>
> --
> PHP Database Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>

---------------------------------------------------------------------------
Peter Beckman            Systems Engineer, Fairfax Cable Access Corporation
beckman@purplecow.com                             http://www.purplecow.com/
---------------------------------------------------------------------------

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php