[PATCH 04/12] xfsprogs: adjust rather than start over when invalid byte found

Alex Elder <aelder@xxxxxxx> · Thu, 30 Dec 2010 14:40:37 -0600

The last 5 bytes of a filename generated by generate_obfuscated_name()
can be selected such that they (along with all of the preceding
characters in the filename) produce any desired value when hashed.

They are selected based on how their value affects the outcome of
the hash calculation for the obfuscated name.  Each byte is XOR'd
into the hash at a certain position.  The portion of the hash
affected by each of these last five bytes can be seen visually
below:

	+-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-+
hash:	| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
	+-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-+
       last-4 ->|           |<-- last-2 --->|           |<--- last ---->|
              |<-- last-3 --->|           |<-- last-1 --->|     |<- last-4

(Note byte (last - 4) wraps around.  The previous patch in this
series eliminated the effect of the upper 4 bits of that byte
by forcing them to be all be 0 bits.)

Using the (XOR) difference between the hash computed for the
beginning of the obfuscated name and the hash from the original
value, we can directly determine the required final five bytes to
make the hashes for the two complete filenames match.  The lower
byte (bits 0-7) of that difference is used for the last character in
the obfuscated name, bits 7-14 for the second-to-last, and so on.

It's possible for those portions to result in characters ('\0' and
'/') that are not allowed in valid file names.  If this occurs, the
existing code abandons the current obfuscated file name and starts
again from the beginning.  But there exist cases where this can lead
to a never-ending loop.  Arkadiusz MiÅkiewicz encountered just such
a name, "R\323\257NE".  That filename produces hash value
0x3a4be740, which requires that the obfuscated name uses '/' at
position last-2.  The current algorithm starts over, but since there
are no random characters in this length-5 obfuscated name, no other
possibility will be found and the process repeats forever.


This change modifies the algorithm used so that if a non-allowed
character arises, we flip a bit in that character (which results
in an allowed character), along with another "matching" bit in
another byte such that the resulting hash is unchanged.

The two un-allowed characters in a pathname component are '\0'
(0x00) and '/' (0x2f).  Flipping any one bit in either one of those
characters converts it to an allowed character.  Flipping two
distinct bits also results in an allowed character.  As seen in the
diagram above, the effect of each character on the hash overlaps two
other characters.  We can choose to flip one of the overlapping bits
in a "bad" character, then flip the overlapping bit in the next
byte, and the result will produce the same hash.

So, starting with the first of these last 5 bytes (last-4), if its
"normal" value is one of the invalid ones, we flip its low bit and
arrange to flip the high bit of its successor byte.

The very last byte has a little different treatment.  We can flip
its low bit, but it has no successor byte per se.  Its effect on the
hash does overlap the first byte of the 5-byte series though, so we
can flip the corresponding bit in that (at position 0x10).


There is one more case to consider.  It's possible in that last
case that by flipping a bit in byte (last-4), we have converted
that byte to one that's not allowed.  Fortunately, we have four
bits of overlap here, so we can choose to flip a second bit in
both the last byte and its corresponding bit at (last-4).  Since
flipping two bits also results in an allowed character, this
resolves the issue.


With these changes to the filename generation algorithm, we avoid
any of the cases in which no alternate name can be found without
using an illegal character.  We also avoid all looping due to bad
characters.

Reported-by: Arkadiusz MiÅkiewicz <arekm@xxxxxxxx>
Signed-off-by: Alex Elder <aelder@xxxxxxx>

---
 db/metadump.c |   77 +++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 61 insertions(+), 16 deletions(-)

Index: b/db/metadump.c
===================================================================

--- a/db/metadump.c
+++ b/db/metadump.c
@@ -452,6 +452,8 @@ generate_obfuscated_name(
 	do {
 		dup = 0;
 		for (;;) {
+			uchar_t	high_bit;
+
 		    	/*
 			 * The beginning of the obfuscated name can
 			 * be pretty much anything, so fill it in
@@ -468,25 +470,68 @@ generate_obfuscated_name(
 			 * Compute which five bytes need to be used
 			 * at the end of the name so the hash of the
 			 * obfuscated name is the same as the hash
-			 * of the original.
+			 * of the original.  If any result in an
+			 * invalid character, flip a bit and arrange
+			 * for a corresponding bit in a neighboring
+			 * byte to be flipped as well.  For the last
+			 * byte, the "neighbor" to change is the
+			 * first byte we're computing here.
 			 */
 			newhash = rol32(newhash, 3) ^ hash;
 
-			newp[namelen - 5] = (newhash >> 28) & 0x7f;
-			if (is_invalid_char(newp[namelen - 5]))
-				continue;
-			newp[namelen - 4] = (newhash >> 21) & 0x7f;
-			if (is_invalid_char(newp[namelen - 4]))
-				continue;
-			newp[namelen - 3] = (newhash >> 14) & 0x7f;
-			if (is_invalid_char(newp[namelen - 3]))
-				continue;
-			newp[namelen - 2] = (newhash >> 7) & 0x7f;
-			if (is_invalid_char(newp[namelen - 2]))
-				continue;
-			newp[namelen - 1] = (newhash >> 0) & 0x7f;
-			if (is_invalid_char(newp[namelen - 1]))
-				continue;
+			high_bit = 0;
+
+			newp[namelen - 5] = ((newhash >> 28) & 0x7f) ^ high_bit;
+			if (is_invalid_char(newp[namelen - 5])) {
+				newp[namelen - 5] ^= 1;
+				high_bit = 0x80;
+			} else
+				high_bit = 0;
+
+			newp[namelen - 4] = ((newhash >> 21) & 0x7f) ^ high_bit;
+			if (is_invalid_char(newp[namelen - 4])) {
+				newp[namelen - 4] ^= 1;
+				high_bit = 0x80;
+			} else
+				high_bit = 0;
+
+			newp[namelen - 3] = ((newhash >> 14) & 0x7f) ^ high_bit;
+			if (is_invalid_char(newp[namelen - 3])) {
+				newp[namelen - 3] ^= 1;
+				high_bit = 0x80;
+			} else
+				high_bit = 0;
+
+			newp[namelen - 2] = ((newhash >> 7) & 0x7f) ^ high_bit;
+			if (is_invalid_char(newp[namelen - 2])) {
+				newp[namelen - 2] ^= 1;
+				high_bit = 0x80;
+			} else
+				high_bit = 0;
+
+			newp[namelen - 1] = ((newhash >> 0) & 0x7f) ^ high_bit;
+			if (is_invalid_char(newp[namelen - 1])) {
+				newp[namelen - 1] ^= 1;
+				high_bit = 0x80;
+			} else
+				high_bit = 0;
+
+			/*
+			 * If we flipped a bit on the last byte, we
+			 * need to fix up the first one we computed.
+			 * That could make that first character
+			 * invalid, in which case we flip one more
+			 * bit in both bytes.
+			 */
+			if (high_bit) {
+			    	newp[namelen - 5] ^= 0x10;
+			    	if (is_invalid_char(newp[namelen - 5])) {
+					newp[namelen - 1] ^= 2;
+					newp[namelen - 5] ^= 0x20;
+					ASSERT(!is_invalid_char(newp[namelen - 1]));
+					ASSERT(!is_invalid_char(newp[namelen - 5]));
+				}
+			}
 			break;
 		}
 


_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs