Re: [PATCH 4/5] tree-walk: unroll get_mode since loop boundaries are well-known

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 4, 2011 at 5:29 AM, Erik Faye-Lund <kusmabite@xxxxxxxxx> wrote:
>
> We perfer this style:
>
> if (c < '0' || c > '7')
>        return NULL;
>
> i.e a line-break and a tab between the if-statement and the conditional code.
I am well aware, I just wanted to make the code a bit less lengthy due
to the repetition.

On Mon, Apr 4, 2011 at 11:55 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Dan McGee <dpmcgee@xxxxxxxxx> writes:
>
>> We know our mode entry in our tree objects should be 5 or 6 characters
>> long. This change both enforces this fact...
>
> I find the implementation later shown in the thread is cleaner,
Totally agree; I should have tried to do it this way in the first
place. However, compiling the fixed-length 0 to 5 loop does not
produce fully-unrolled assembly for me with CFLAGS="-march=native
-mtune=native -O2 -pipe -g" on x86_64. I see two copies of the loop
only, and even worse is the (lack of) performance (each is the mode of
3 runs). Compilers are stupid apparently.

Hand-unrolled:
$ time ../git/git-log -- zzzzz_not_exist > /dev/null
real	0m28.616s
user	0m28.428s
sys	0m0.177s

Static counter loop:
$ time ../git/git-log -- zzzzz_not_exist > /dev/null
real	0m26.393s
user	0m26.185s
sys	0m0.203s

Diff between the two down at the bottom.

> but I'd
> comment on the word "enforces" here.
>
> It is more like "versions of git we know how to read from writes mode bits
> with 5 or 6 characters---if we see something else, either the tree object
> is corrupt, or we are too old to read the new type of entry in the tree
> object".
OK, sounds like we have opposite thinking on this, interesting.

get_mode() is not get_permissions()- it is the same as the stat()
st_mode field, which means it has to (as of now) include at least one
of the S_IFREG, S_IFDIR, or S_IFGITLINK flags in its value.

All of S_ISREG, S_ISDIR, and S_ISGITLINK are used fsck_walk_tree(). So
I'm not seeing a change in the tree storage format happening anytime
soon as there is little way to preserve both forward and backward
compatibility.

Currently git will happily parse both a "1" in the tree stream, or a
"165324132132464677321513252351"- it doesn't care. The problem is in
the former case, a mode of 01 doesn't have any significance later on.
In the latter case, we will be left-shifting our mode to hell and it
becomes worthless.

Do we disagree on clamping the lower limit at 5, the upper limit at 6, or both?

> So returning NULL is fine and it tells the caller that we do not
> understand the tree object. ÂThe caller says "corrupt tree file" when we
> do so here, but this change needs to rephrase it. ÂIf we stopped because
> we saw something other than ' ' after the run of octal digits, then we
> know the tree is corrupt. ÂIf we saw a three octal digits 644 in the mode
> field, terminated with ' ', maybe we are seeing a new kind of tree entry
> generated from later versions of git.
>
> Ideally, I'd rather see error checking done even higher layer than
> decode_tree_entry() for the "we are too old" case, though. ÂWe should
> return mode 0644 in such a case, and let the caller suggest that the
> version of git we are running might be too old, or the tree may have been
> written by a broken system that tried to mimic git in its error message.
>

diff --git a/tree-walk.c b/tree-walk.c
index 63901f8..63ec130 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -7,34 +7,21 @@
 static const char *get_mode(const char *str, unsigned int *modep)
 {
        unsigned char c;
-       unsigned int mode = 0;
+       unsigned int mode = 0, i;

        /*
-        * Unroll what looks like a loop since the bounds are
+        * Allow the compiler to unroll the loop since the bounds are
         * well-known. There should be at least 5 and at most 6
         * characters available in any valid mode, as '40000' is the
         * shortest while '160000' (S_IFGITLINK) is the longest.
         */
-       /* char 1 */
-       c = *str++;
-       if (c < '0' || c > '7') return NULL;
-       mode = (mode << 3) + (c - '0');
-       /* char 2 */
-       c = *str++;
-       if (c < '0' || c > '7') return NULL;
-       mode = (mode << 3) + (c - '0');
-       /* char 3 */
-       c = *str++;
-       if (c < '0' || c > '7') return NULL;
-       mode = (mode << 3) + (c - '0');
-       /* char 4 */
-       c = *str++;
-       if (c < '0' || c > '7') return NULL;
-       mode = (mode << 3) + (c - '0');
-       /* char 5 */
-       c = *str++;
-       if (c < '0' || c > '7') return NULL;
-       mode = (mode << 3) + (c - '0');
+       /* chars 1-5, optional */
+       for (i = 0; i < 5; i++) {
+               c = *str++;
+               if (c < '0' || c > '7')
+                       return NULL;
+               mode = (mode << 3) + (c - '0');
+       }
        /* char 6, optional */
        if (*str != ' ') {
                c = *str++;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]