Re: Waste of storage space?

Maciek Sokolewicz <tularis@xxxxxxxxx> · Wed, 29 Oct 2008 09:50:04 +0100

Frank Arensmeier wrote:
29 okt 2008 kl. 00.00 skrev Maciek Sokolewicz:

Frank Arensmeier wrote:
Hi all.
In short, I am working on a system that allows me to keep track of 
changes to a large amount of short texts (a couple of thousand text 
snippets, two or three sentences per text). All text is stored in a 
database. As soon as a user changes some text (insert, delete, 
update), this action is recorded. Look at an article on e.g. 
Wikipedia and click "History". This is more or less what I am trying 
to accomplish.
Right now, my "history" class that takes care of all changes, is 
working pretty much as I want. The thing is that both the original 
text and the altered text is stored in the database every time the 
text is changed. My concern is that this will eventually evolve into 
a serious problem regarding amount of storage and performance. So, I 
am looking for a more efficient way to store all changes.
Ideas I have come up with so far are:
1) Store the "delta" (=the actual change) of a text change. This 
could be done by utilizing the Pear package TextDiff. My idea was to 
compare the old with the new text with help of the TextDiff class. I 
would then grab the array containing the changes from TextDiff, 
serialize it and store this data into the db. The problem is that 
this is every thing else but efficient when it comes to smaller text 
(the serialized array holding the changes was actually larger than 
the two texts combined).
2) Do some kind of compression on the text to be stored. However, it 
seems that the build-in compression functions from PHP5 are more 
efficient when it comes to large texts.
Any other ideas?
thank you.
//frank
ps. I notice that Mediawiki also stores complete articles in the db 
(every time an article is updated, the hole article is stored in the 
database). ds.

Hi Frank,

why don't you simply make use of systems specifically designed for 
such things. eg. CVS or SVN (subversion.tigris.org). You could pretty 
easily tie it in with your application. It's quite compact, and 
pretty fast too.

- Tul

Hi Tul.

I think would be an idea worth investigating a little bit more. But 
what about performance? I am really not that familiar with version 
control systems like CVS etc. Let's say there are 30 different text 
snippets with 10 recorded changes each. And I want to see what changes 
users have made to those snippets. That would be 300 calls to the 
(filesystem based) CVS system.
30 calls actually, not 300
Would that be overheat? Besides that, in the database I am able to 
store more information about those recorded changes. E.g. the user ID 
and the time is currently stored as well. Can this be done with CVS as 
well?
you can store a userid with these pretty easily (easier with svn than 
with cvs). And the exact date/time is stored automatically aswell.

/frank

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php