Thanks Peter! It turns out the people I work for don't care at all about this data, they were only using it to push clients to buy bigger packages and for bragging rights, so grand totals/totals per client/per period are all they need, the extra data I was logging *just in case* is totally useless to them, so they pushed for the quickest solution. I ended up implementing a scheme to just sort of "condense" the data by a common factor (the client's username), and record totals (see my post: Query Help for more info). I'm using cron jobs to run SQL on the database periodically to "move" the data around, and I used table locking in an attempt to keep the data sound. I'm still weary of the huge number (ok, its only like 5, but still) of queries and loops I'm using to gather the data for display, but it seems to be working ok (any suggestions would be greatly appreciated). I'll definitely keep your suggestions in mind for more serious archiving of data in the future. Thanks again! -- Josh -----Original Message----- From: Peter Beckman [mailto:beckman@purplecow.com] Sent: Thursday, November 07, 2002 12:05 AM To: Josh Johnson Cc: php-db@lists.php.net Subject: Re: Archiving A Database Do you need to archive it forever? Do you need live access to it? If you just need to archive it, read the man page for mysqldump. You can dump specific rows to a file, rename that file dump.2002.05.24, compress it with gzip, and voila, you have all of your data readily available, easily reinserted into your existing live DB. After the dump is complete, just delete those rows out of the DB. Now you have to be sure that the file contains 100% of the data and is accurate. You'll have to figure out how to do that yourself! One way I did it with WWW log files was to make a tmp copy, run a program on it, see if the output had the same number of lines in it, and if it did, delete the tmp copy and replace the original copy with the new copy (IP->DNS translation). In this case you could write a script that parses the newly created dump file in the standard format and, heck, using perl or PHP parse the thing for ),( and see how many you get. If the mysql select gets X rows, and the "),(" or ");" occurs X-1 times, then you know you got all the rows. (OK, it will be more complex than this, but you get the idea.) One record per file could get ugly real quick, both on the filesystem side and the maintenance side. Doing it by day will allow you to repopulate an entire day back into the DB as needed for billing questions or what not. MySQL will keep the unique ID (assuming it is a number auto_incremented) unique and only use ever higher IDs, not use old ones that no longer exist in the DB at the moment. Once you are done with that Day (or days) of data, you just delete it from the DB, knowing you have a copy of that day on file. I would also advise you to have 2-4 copies of all your backups on both physically different computers as well as physically different locations. I kept a copy of my live DB, backed up every 6 hours, at the data center on other computer, on my computer file server at home, and another one 7 states over. Sure, it's a bit of a pain in the ass, but if the data center goes up in smoke and my file server is hacked on the same day, I still have a copy 7 states over. I just used scp (secure-shell copy) and/or rsync + ssh to keep copies current and up to date on remote machines. Peter On Tue, 29 Oct 2002, Josh Johnson wrote: > Does anyone know of speedy methods of periodically archiving and > retrieving a database table? I've got a table I'm using to log > statistical data, and it's getting 10k+ entries per day. Entries are > only logged in the database if the user is authenticated for access, and > only once per access session (the user is authenticated, a cookie is > set, and the entry is logged. From that point the user is authenticated > by verifying the cookie, which expires when the user's browser closes) > > The data is critical (used for billing and such), but the log viewing > script I've written runs slower and slower with each passing day, and at > one point the entire table was corrupted due to neglect on our system > admin's part (I warned them about the default setting of > mysql_max_connections being too low for a site that relied on the DB so > heavily and had so much traffic.). The script is a simple front end to > several MySQL queries, that retrieve totals for the past hour, day, > week, month and grand totals for all clients or a given client. The rest > of the site has been unaffected by the log issues. > > My current approach is to log the data as XML, one record per file, and > create a log file that is an index of refrences to each record. I then > empty previous data n the database. This works, but is very hard/slow to > retrieve specific groups of information, and dumping the database takes > a long time as well, which adds the difficulty of ensuring no records > are lost. I also haven't yet tackled the issue of combining data from > the XML seamlessly with live data from the database. > > I'm opening the floor to other potential solutions, or war stories from > anyone else who's been in this situation before. If there's another > approach aside from writing individual flat files, please let me know. I > haven't tested it yet, but I think maybe logging all entries/data for a > given log period into one file would speed things up, but I worry about > the 40 megabyte files that could be potentially created. I'm also open > to complete alternatives, since the main use for this data is to > generate total accesses per a given factor (client, ip, referrer, etc). > > Any help is greatly appreciated. I can give any information needed, just > ask! > > Thanks, > -- Josh > > P.S. I did suggest that this issue be tackled when I first installed the > system that creates the log, but management didn't think it was worth > bothering with at the time. > > P.P.S. I did look through the archives before I posted, and didn't find > anything. Please flame in private if I missed something! > > > > -- > PHP Database Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > ------------------------------------------------------------------------ --- Peter Beckman Systems Engineer, Fairfax Cable Access Corporation beckman@purplecow.com http://www.purplecow.com/ ------------------------------------------------------------------------ --- -- PHP Database Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php