RE: Archiving A Database

"Josh Johnson" <mailing_lists@allusiveimage.com> · Thu, 7 Nov 2002 07:58:04 -0500

Thanks Peter!

It turns out the people I work for don't care at all about this data,
they were only using it to push clients to buy bigger packages and for
bragging rights, so grand totals/totals per client/per period are all
they need, the extra data I was logging *just in case* is totally
useless to them, so they pushed for the quickest solution. 

I ended up implementing a scheme to just sort of "condense" the data by
a common factor (the client's username), and record totals (see my post:
 Query Help for more info). I'm using cron jobs to run SQL on
the database periodically to "move" the data around, and I used table
locking in an attempt to keep the data sound. I'm still weary of the
huge number (ok, its only like 5, but still) of queries and loops I'm
using to gather the data for display, but it seems to be working ok (any
suggestions would be greatly appreciated). 

I'll definitely keep your suggestions in mind for more serious archiving
of data in the future. 

Thanks again!

-- Josh

-----Original Message-----
From: Peter Beckman [mailto:beckman@purplecow.com] 
Sent: Thursday, November 07, 2002 12:05 AM
To: Josh Johnson
Cc: php-db@lists.php.net
Subject: Re:  Archiving A Database

Do you need to archive it forever?  Do you need live access to it?

If you just need to archive it, read the man page for mysqldump.  You
can
dump specific rows to a file, rename that file dump.2002.05.24, compress
it
with gzip, and voila, you have all of your data readily available,
easily
reinserted into your existing live DB.  After the dump is complete, just
delete those rows out of the DB.

Now you have to be sure that the file contains 100% of the data and is
accurate.  You'll have to figure out how to do that yourself!  One way I
did it with WWW log files was to make a tmp copy, run a program on it,
see
if the output had the same number of lines in it, and if it did, delete
the
tmp copy and replace the original copy with the new copy (IP->DNS
translation).

In this case you could write a script that parses the newly created dump
file in the standard format and, heck, using perl or PHP parse the thing
for ),( and see how many you get.  If the mysql select gets X rows, and
the
"),(" or ");" occurs X-1 times, then you know you got all the rows.
(OK,
it will be more complex than this, but you get the idea.)

One record per file could get ugly real quick, both on the filesystem
side
and the maintenance side.  Doing it by day will allow you to repopulate
an
entire day back into the DB as needed for billing questions or what not.
MySQL will keep the unique ID (assuming it is a number auto_incremented)
unique and only use ever higher IDs, not use old ones that no longer
exist
in the DB at the moment.  Once you are done with that Day (or days) of
data, you just delete it from the DB, knowing you have a copy of that
day
on file.

I would also advise you to have 2-4 copies of all your backups on both
physically different computers as well as physically different
locations.
I kept a copy of my live DB, backed up every 6 hours, at the data center
on
other computer, on my computer file server at home, and another one 7
states over.  Sure, it's a bit of a pain in the ass, but if the data
center
goes up in smoke and my file server is hacked on the same day, I still
have
a copy 7 states over.  I just used scp (secure-shell copy) and/or rsync
+
ssh to keep copies current and up to date on remote machines.

Peter

On Tue, 29 Oct 2002, Josh Johnson wrote:

> Does anyone know of speedy methods of periodically archiving and
> retrieving a database table? I've got a table I'm using to log
> statistical data, and it's getting 10k+ entries per day. Entries are
> only logged in the database if the user is authenticated for access,
and
> only once per access session (the user is authenticated, a cookie is
> set, and the entry is logged. From that point the user is
authenticated
> by verifying the cookie, which expires when the user's browser closes)
>
> The data is critical (used for billing and such), but the log viewing
> script I've written runs slower and slower with each passing day, and
at
> one point the entire table was corrupted due to neglect on our system
> admin's part (I warned them about the default setting of
> mysql_max_connections being too low for a site that relied on the DB
so
> heavily and had so much traffic.). The script is a simple front end to
> several MySQL queries, that retrieve totals for the past hour, day,
> week, month and grand totals for all clients or a given client. The
rest
> of the site has been unaffected by the log issues.
>
> My current approach is to log the data as XML, one record per file,
and
> create a log file that is an index of refrences to each record. I then
> empty previous data n the database. This works, but is very hard/slow
to
> retrieve specific groups of information, and dumping the database
takes
> a long time as well, which adds the difficulty of ensuring no records
> are lost. I also haven't yet tackled the issue of combining data from
> the XML seamlessly with live data from the database.
>
> I'm opening the floor to other potential solutions, or war stories
from
> anyone else who's been in this situation before. If there's another
> approach aside from writing individual flat files, please let me know.
I
> haven't tested it yet, but I think maybe logging all entries/data for
a
> given log period into one file would speed things up, but I worry
about
> the 40 megabyte files that could be potentially created. I'm also open
> to complete alternatives, since the main use for this data is to
> generate total accesses per a given factor (client, ip, referrer,
etc).
>
> Any help is greatly appreciated. I can give any information needed,
just
> ask!
>
> Thanks,
> -- Josh
>
> P.S. I did suggest that this issue be tackled when I first installed
the
> system that creates the log, but management didn't think it was worth
> bothering with at the time.
>
> P.P.S. I did look through the archives before I posted, and didn't
find
> anything. Please flame in private if I missed something!
>
>
>
> --
> PHP Database Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>

------------------------------------------------------------------------
---
Peter Beckman            Systems Engineer, Fairfax Cable Access
Corporation
beckman@purplecow.com
http://www.purplecow.com/
------------------------------------------------------------------------
---

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php