Re: [RFH] Exploration of an alternative diff_delta() algorithm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 10 Apr 2006, Peter Eriksen wrote:

> On Sun, Apr 09, 2006 at 01:45:00PM -0400, Nicolas Pitre wrote:
> ...
> > Try this with the README file from the git source tree:
> > 
> > 	sed s/git/GIT/g < ./README > /tmp/README.mod
> > 	test-delta -d ./README /tmp/README.mod /tmp/README.delta
> > 	[BOOM!]
> 
> I found the bug.  The code still has some limitations, but now
> it passes the test suite.  Thanks for your help, Nicolas.

OK here's some more meat for you:

Copy the same README file from the git source tree, then edit the copied 
version so the "Blob Object" section and the "Tree Object" section are 
swapped around like shown in the attached patch.

The best delta that can be achieved is 24 bytes.

With the current code the produced delta is 42 bytes.

With your code the resulting delta is 4978 bytes, about twice as large 
as the attached patch.

One major limitation of your algorithm appears to not have a global view 
of the base buffer before starting to find matches.


Nicolas
--- f1	2006-04-09 13:31:26.000000000 -0400
+++ f2	2006-04-09 23:04:10.000000000 -0400
@@ -87,26 +87,6 @@
 
 The object types in some more detail:
 
-Blob Object
-~~~~~~~~~~~
-A "blob" object is nothing but a binary blob of data, and doesn't
-refer to anything else.  There is no signature or any other
-verification of the data, so while the object is consistent (it 'is'
-indexed by its sha1 hash, so the data itself is certainly correct), it
-has absolutely no other attributes.  No name associations, no
-permissions.  It is purely a blob of data (i.e. normally "file
-contents").
-
-In particular, since the blob is entirely defined by its data, if two
-files in a directory tree (or in multiple different versions of the
-repository) have the same contents, they will share the same blob
-object. The object is totally independent of its location in the
-directory tree, and renaming a file does not change the object that
-file is associated with in any way.
-
-A blob is typically created when gitlink:git-update-index[1]
-is run, and its data can be accessed by gitlink:git-cat-file[1].
-
 Tree Object
 ~~~~~~~~~~~
 The next hierarchical object type is the "tree" object.  A tree object
@@ -147,6 +127,26 @@
 its data can be accessed by gitlink:git-ls-tree[1].
 Two trees can be compared with gitlink:git-diff-tree[1].
 
+Blob Object
+~~~~~~~~~~~
+A "blob" object is nothing but a binary blob of data, and doesn't
+refer to anything else.  There is no signature or any other
+verification of the data, so while the object is consistent (it 'is'
+indexed by its sha1 hash, so the data itself is certainly correct), it
+has absolutely no other attributes.  No name associations, no
+permissions.  It is purely a blob of data (i.e. normally "file
+contents").
+
+In particular, since the blob is entirely defined by its data, if two
+files in a directory tree (or in multiple different versions of the
+repository) have the same contents, they will share the same blob
+object. The object is totally independent of its location in the
+directory tree, and renaming a file does not change the object that
+file is associated with in any way.
+
+A blob is typically created when gitlink:git-update-index[1]
+is run, and its data can be accessed by gitlink:git-cat-file[1].
+
 Commit Object
 ~~~~~~~~~~~~~
 The "commit" object is an object that introduces the notion of

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]