On Tue, Oct 28, 2008 at 4:24 PM, Frank Arensmeier <frank@xxxxxxxxxxxx>wrote: > Hi all. > > In short, I am working on a system that allows me to keep track of changes > to a large amount of short texts (a couple of thousand text snippets, two or > three sentences per text). All text is stored in a database. As soon as a > user changes some text (insert, delete, update), this action is recorded. > Look at an article on e.g. Wikipedia and click "History". This is more or > less what I am trying to accomplish. > > Right now, my "history" class that takes care of all changes, is working > pretty much as I want. The thing is that both the original text and the > altered text is stored in the database every time the text is changed. My > concern is that this will eventually evolve into a serious problem regarding > amount of storage and performance. So, I am looking for a more efficient way > to store all changes. > > Ideas I have come up with so far are: > > 1) Store the "delta" (=the actual change) of a text change. This could be > done by utilizing the Pear package TextDiff. My idea was to compare the old > with the new text with help of the TextDiff class. I would then grab the > array containing the changes from TextDiff, serialize it and store this data > into the db. The problem is that this is every thing else but efficient when > it comes to smaller text (the serialized array holding the changes was > actually larger than the two texts combined). > > 2) Do some kind of compression on the text to be stored. However, it seems > that the build-in compression functions from PHP5 are more efficient when it > comes to large texts. > > Any other ideas? > > thank you. > //frank > > ps. I notice that Mediawiki also stores complete articles in the db (every > time an article is updated, the hole article is stored in the database). ds. > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > > Save just the new version each time table like record_id //PK relates_to //FK item_text author_id timestamp much easier to work with -- Bastien Cat, the other other white meat