The last 5 bytes of a filename generated by generate_obfuscated_name() can be selected such that they (along with all of the preceding characters in the filename) produce any desired value when hashed. They are selected based on how their value affects the outcome of the hash calculation for the obfuscated name. Each byte is XOR'd into the hash at a certain position. The portion of the hash affected by each of these last five bytes can be seen visually below: +-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-+ hash: | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-+ last-4 ->| |<-- last-2 --->| |<--- last ---->| |<-- last-3 --->| |<-- last-1 --->| |<- last-4 (Note byte (last - 4) wraps around. The previous patch in this series eliminated the effect of the upper 4 bits of that byte by forcing them to be all be 0 bits.) Using the (XOR) difference between the hash computed for the beginning of the obfuscated name and the hash from the original value, we can directly determine the required final five bytes to make the hashes for the two complete filenames match. The lower byte (bits 0-7) of that difference is used for the last character in the obfuscated name, bits 7-14 for the second-to-last, and so on. It's possible for those portions to result in characters ('\0' and '/') that are not allowed in valid file names. If this occurs, the existing code abandons the current obfuscated file name and starts again from the beginning. But there exist cases where this can lead to a never-ending loop. Arkadiusz MiÅkiewicz encountered just such a name, "R\323\257NE". That filename produces hash value 0x3a4be740, which requires that the obfuscated name uses '/' at position last-2. The current algorithm starts over, but since there are no random characters in this length-5 obfuscated name, no other possibility will be found and the process repeats forever. This change modifies the algorithm used so that if a non-allowed character arises, we flip a bit in that character (which results in an allowed character), along with another "matching" bit in another byte such that the resulting hash is unchanged. The two un-allowed characters in a pathname component are '\0' (0x00) and '/' (0x2f). Flipping any one bit in either one of those characters converts it to an allowed character. Flipping two distinct bits also results in an allowed character. As seen in the diagram above, the effect of each character on the hash overlaps two other characters. We can choose to flip one of the overlapping bits in a "bad" character, then flip the overlapping bit in the next byte, and the result will produce the same hash. So, starting with the first of these last 5 bytes (last-4), if its "normal" value is one of the invalid ones, we flip its low bit and arrange to flip the high bit of its successor byte. The very last byte has a little different treatment. We can flip its low bit, but it has no successor byte per se. Its effect on the hash does overlap the first byte of the 5-byte series though, so we can flip the corresponding bit in that (at position 0x10). There is one more case to consider. It's possible in that last case that by flipping a bit in byte (last-4), we have converted that byte to one that's not allowed. Fortunately, we have four bits of overlap here, so we can choose to flip a second bit in both the last byte and its corresponding bit at (last-4). Since flipping two bits also results in an allowed character, this resolves the issue. With these changes to the filename generation algorithm, we avoid any of the cases in which no alternate name can be found without using an illegal character. We also avoid all looping due to bad characters. Reported-by: Arkadiusz MiÅkiewicz <arekm@xxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> --- db/metadump.c | 77 +++++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 61 insertions(+), 16 deletions(-) Index: b/db/metadump.c =================================================================== --- a/db/metadump.c +++ b/db/metadump.c @@ -452,6 +452,8 @@ generate_obfuscated_name( do { dup = 0; for (;;) { + uchar_t high_bit; + /* * The beginning of the obfuscated name can * be pretty much anything, so fill it in @@ -468,25 +470,68 @@ generate_obfuscated_name( * Compute which five bytes need to be used * at the end of the name so the hash of the * obfuscated name is the same as the hash - * of the original. + * of the original. If any result in an + * invalid character, flip a bit and arrange + * for a corresponding bit in a neighboring + * byte to be flipped as well. For the last + * byte, the "neighbor" to change is the + * first byte we're computing here. */ newhash = rol32(newhash, 3) ^ hash; - newp[namelen - 5] = (newhash >> 28) & 0x7f; - if (is_invalid_char(newp[namelen - 5])) - continue; - newp[namelen - 4] = (newhash >> 21) & 0x7f; - if (is_invalid_char(newp[namelen - 4])) - continue; - newp[namelen - 3] = (newhash >> 14) & 0x7f; - if (is_invalid_char(newp[namelen - 3])) - continue; - newp[namelen - 2] = (newhash >> 7) & 0x7f; - if (is_invalid_char(newp[namelen - 2])) - continue; - newp[namelen - 1] = (newhash >> 0) & 0x7f; - if (is_invalid_char(newp[namelen - 1])) - continue; + high_bit = 0; + + newp[namelen - 5] = ((newhash >> 28) & 0x7f) ^ high_bit; + if (is_invalid_char(newp[namelen - 5])) { + newp[namelen - 5] ^= 1; + high_bit = 0x80; + } else + high_bit = 0; + + newp[namelen - 4] = ((newhash >> 21) & 0x7f) ^ high_bit; + if (is_invalid_char(newp[namelen - 4])) { + newp[namelen - 4] ^= 1; + high_bit = 0x80; + } else + high_bit = 0; + + newp[namelen - 3] = ((newhash >> 14) & 0x7f) ^ high_bit; + if (is_invalid_char(newp[namelen - 3])) { + newp[namelen - 3] ^= 1; + high_bit = 0x80; + } else + high_bit = 0; + + newp[namelen - 2] = ((newhash >> 7) & 0x7f) ^ high_bit; + if (is_invalid_char(newp[namelen - 2])) { + newp[namelen - 2] ^= 1; + high_bit = 0x80; + } else + high_bit = 0; + + newp[namelen - 1] = ((newhash >> 0) & 0x7f) ^ high_bit; + if (is_invalid_char(newp[namelen - 1])) { + newp[namelen - 1] ^= 1; + high_bit = 0x80; + } else + high_bit = 0; + + /* + * If we flipped a bit on the last byte, we + * need to fix up the first one we computed. + * That could make that first character + * invalid, in which case we flip one more + * bit in both bytes. + */ + if (high_bit) { + newp[namelen - 5] ^= 0x10; + if (is_invalid_char(newp[namelen - 5])) { + newp[namelen - 1] ^= 2; + newp[namelen - 5] ^= 0x20; + ASSERT(!is_invalid_char(newp[namelen - 1])); + ASSERT(!is_invalid_char(newp[namelen - 5])); + } + } break; } _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs