The last 5 bytes of a name generated by generate_obfuscated_name() can be selected such that they (along with all of the preceding characters in the name) produce any desired value when hashed. They are selected based on how their value affects the outcome of the hash calculation for the obfuscated name. Each byte is XOR'd into the hash at a certain position. The portion of the hash affected by each of these last five bytes can be seen visually below (where "last-3" means the byte 3 positions before the last byte in the name): +-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-+ hash: | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-+ last-4 ->| |<-- last-2 --->| |<--- last ---->| |<-- last-3 --->| |<-- last-1 --->| |<- last-4 (Note that byte (last-4) wraps around. The previous patch in this series eliminated the effect of the upper 4 bits of that byte by forcing them to be all be 0 bits.) Using the (XOR) difference between the hash computed for the beginning of the obfuscated name and the hash from the original name, we directly determine the required final five bytes to make the hashes for the two complete names match. The lower byte (bits 0-7) of that difference is used for the last character in the obfuscated name, bits 7-14 for the second-to-last, and so on. So we start with the difference between the hash from the complete original name and the hash (so far) for a random string constituting the first part of the obfuscated name. We extract five sets of 8 bits from the result at the positions indicated above, and those 8-bit values will become the final five bytes of the obfuscated name. By assuming (or forcing) the top bit of each of these extracted values to be 0 (by masking off the top bit), we can ignore the overlapping portions when determining the bytes to use. It's possible for this process to produce characters ('\0' and '/') that are not allowed in valid names. If this occurs, the existing code abandons the current obfuscated name and starts again from the beginning. But there exist cases where this can lead to a never-ending loop. Arkadiusz MiÅkiewicz encountered just such a name, "R\323\257NE". That name produces hash value 0x3a4be740, which requires that the obfuscated name uses '/' at position last-2. The current algorithm starts over, but since there are no random characters in this length-5 obfuscated name, no other possibility will be found, and the process repeats forever. This change modifies the algorithm used so that if a unallowed character arises, we flip a bit in that character, along with another "matching" bit in another (overlapping) character such that the resulting hash is unchanged. The two unallowed characters in a name are '\0' (0x00) and '/' (0x2f), and flipping any one bit in either of those characters results in an allowed character. So, starting with the first of these last 5 bytes (last-4), if its "normal" value is one of the unallowed characters, we flip its low bit and arrange to flip the high bit of its successor byte. The remaining bytes are treated similarly. The very last byte has a little different treatment. We can flip its low bit, but it has no successor byte per se. Its effect on the hash does, however overlap the upper four bits from byte (last-4). We can therefore flip the corresponding bit in that (at position 0x10). There is one more case to consider. It's possible in that last case that by flipping a bit in byte (last-4), we have converted that byte to one that's not allowed. It turns out this won't ever happen, because we know that byte was initially assigned a value with its upper four bits clear. Flipping the bit at 0x10 cannot therefore produce either 0x00 or 0x2f, so we don't need to treat this case. With these changes to the name generation algorithm, we avoid any of the cases in which no alternate name can be found without using an illegal character. We also avoid all looping due to bad characters. Reported-by: Arkadiusz MiÅkiewicz <arekm@xxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> The only notable update since the last version posted is that the second bit-flip in the last byte is no longer done (because I realized it was not necessary). --- db/metadump.c | 75 +++++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 59 insertions(+), 16 deletions(-) Index: b/db/metadump.c =================================================================== --- a/db/metadump.c +++ b/db/metadump.c @@ -474,6 +474,8 @@ generate_obfuscated_name( do { dup = 0; for (;;) { + uchar_t high_bit; + /* * The beginning of the obfuscated name can * be pretty much anything, so fill it in @@ -490,25 +492,66 @@ generate_obfuscated_name( * Compute which five bytes need to be used * at the end of the name so the hash of the * obfuscated name is the same as the hash - * of the original. + * of the original. If any result in an + * invalid character, flip a bit and arrange + * for a corresponding bit in a neighboring + * byte to be flipped as well. For the last + * byte, the "neighbor" to change is the + * first byte we're computing here. */ newhash = rol32(newhash, 3) ^ hash; - newp[namelen - 5] = (newhash >> 28) & 0x7f; - if (is_invalid_char(newp[namelen - 5])) - continue; - newp[namelen - 4] = (newhash >> 21) & 0x7f; - if (is_invalid_char(newp[namelen - 4])) - continue; - newp[namelen - 3] = (newhash >> 14) & 0x7f; - if (is_invalid_char(newp[namelen - 3])) - continue; - newp[namelen - 2] = (newhash >> 7) & 0x7f; - if (is_invalid_char(newp[namelen - 2])) - continue; - newp[namelen - 1] = (newhash >> 0) & 0x7f; - if (is_invalid_char(newp[namelen - 1])) - continue; + high_bit = 0; + + newp[namelen - 5] = ((newhash >> 28) & 0x7f) ^ high_bit; + if (is_invalid_char(newp[namelen - 5])) { + newp[namelen - 5] ^= 1; + high_bit = 0x80; + } else + high_bit = 0; + + newp[namelen - 4] = ((newhash >> 21) & 0x7f) ^ high_bit; + if (is_invalid_char(newp[namelen - 4])) { + newp[namelen - 4] ^= 1; + high_bit = 0x80; + } else + high_bit = 0; + + newp[namelen - 3] = ((newhash >> 14) & 0x7f) ^ high_bit; + if (is_invalid_char(newp[namelen - 3])) { + newp[namelen - 3] ^= 1; + high_bit = 0x80; + } else + high_bit = 0; + + newp[namelen - 2] = ((newhash >> 7) & 0x7f) ^ high_bit; + if (is_invalid_char(newp[namelen - 2])) { + newp[namelen - 2] ^= 1; + high_bit = 0x80; + } else + high_bit = 0; + + newp[namelen - 1] = ((newhash >> 0) & 0x7f) ^ high_bit; + if (is_invalid_char(newp[namelen - 1])) { + newp[namelen - 1] ^= 1; + high_bit = 0x80; + } else + high_bit = 0; + + /* + * If we flipped a bit on the last byte, we + * need to fix up the first one we computed. + * + * That first byte had 0's in its upper four + * bits (it's the result of shifting a + * 32-bit unsigned value Right by 28 bits), + * so we don't need to worry about it + * becoming invalid as a result. + */ + if (high_bit) { + newp[namelen - 5] ^= 0x10; + ASSERT(!is_invalid_char(newp[namelen - 5])); + } break; } _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs