Hi, second patch proposal for isofs because Adverg Ebashinskii wrote: > If you could share your patch here to understand the problem better I would > gladly dig into it. ======================================================================== 0000-cover-letter.patch >From 3d484405f0ad8d10ef490281da157bfdd7450cb6 Mon Sep 17 00:00:00 2001 From: Thomas Schmitt <scdbackup@xxxxxxx> Date: Tue, 22 Sep 2020 12:35:52 +0200 Subject: [PATCH 0/1] isofs: truncate oversized Rock Ridge names to 255 bytes Currently Rock Ridge names of length >= 254 are coarsely truncated by discarding the whole NM entry where the overflow happened. This yields name lengths of much less than the permissible 255 bytes. There is no reason to see why to exclude length 254 and 255 and especially to truncate by possibly a hundred or more bytes than necessary. So i propose to raise the length of permissible names to 255 and to let truncation yield exactly a string length of 255 bytes. Truncation shall take care to invalidate UTF-8 debris at the end of the resulting string (sorry ISO-8859). --------------------------------------------------------------------------- Tests made: Create an ISO 9660 image with file names of length 255, using file /bin/true as input for both files: victim1=12345678901234567890123456789012345678901234567890 victim1="$victim1"12345678901234567890123456789012345678901234567890 victim1="$victim1"12345678901234567890123456789012345678901234567890 victim1="$victim1"12345678901234567890123456789012345678901234567890 victim1="$victim1"12345678901234567890123456789012345678901234567890 victim1="$victim1"1234E victim2=äääääääääääääääääääääääää victim2="$victim2"äääääääääääääääääääääääää victim2="$victim2"äääääääääääääääääääääääää victim2="$victim2"äääääääääääääääääääääääää victim2="$victim2"äääääääääääääääääääääääää victim2="$victim2"xää xorriso -outdev /tmp/test_rr_name.iso \ -blank as_needed \ -map /bin/true /"$victim1" \ -map /bin/true /"$victim2" Currently the names get truncated to byte lengths 93 and 95: mount /tmp/test_rr_name.iso /mnt/iso /bin/ls /mnt/iso yields in xterm with bash 12345678901234567890...60.more.bytes...1234567890123 'ääääääääääääääääääääääääääääääääääääääääääääää'$'\303' Note the leading blank with the plain ASCII name and the shell characters with the name that has 2-byte UTF-8 characters. But /bin/ls /mnt/iso | cat yields 12345678901234567890...60.more.bytes...1234567890123 ääääääääääääääääääääääääääääääääääääääääääääää� The extra characters in xterm seem to be triggered by the presence of the half UTF-8 'ä' at the end. Its byte 0xc3 is there, byte 0xa4 is missing. (xterm and /bin/ls are from Debian 10.) If i make the UTF-8 name shorter to avoid truncation or if i move the 'x' to the start to cause truncation between complete UTF-8 'ä', the extra characters do not show up in ls to xterm. After my change in fs/isofs i get from /bin/ls /mnt/iso 1234567890...230.more.bytes...12345678901234E ääääääääää...210.more.bytes...ääääääääääxää Both strings have 255 bytes. xorriso cannot be talked into writing longer Rock Ridge names. So i rather set the new macro RR_NAME_LEN in rock.h to 33 to force truncation. The result with /bin/ls -1 /mnt/iso is: 123456789012345678901234567890123 ääääääääääääääää_ Note the half 'ä' at the end being mapped to '_'. So all characters are valid UTF-8 and no oddities of ls or xterm are to see. --------------------------------------------------------------------------- Remaining checkpatch.pl warning: scripts/checkpatch.pl complains about the string 'ääääääääääääääääääääääääääääääääääääääääääääää'$'\303' in this text by: WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line) Maybe it should talk about "bytes" rather than "chars" or learn about multi-byte characters in UTF-8. I think it is beneficial if i show the whole mangled name, rather than describing it by some ASCII-only text. --------------------------------------------------------------------------- Have a nice day :) Thomas Thomas Schmitt (1): isofs: truncate oversized Rock Ridge names to 255 bytes fs/isofs/rock.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++--- fs/isofs/rock.h | 2 ++ 2 files changed, 71 insertions(+), 4 deletions(-) -- 2.20.1 ======================================================================== 0001-isofs-truncate-oversized-Rock-Ridge-names-to-255-byt.patch >From 3d484405f0ad8d10ef490281da157bfdd7450cb6 Mon Sep 17 00:00:00 2001 From: Thomas Schmitt <scdbackup@xxxxxxx> Date: Tue, 22 Sep 2020 12:34:50 +0200 Subject: [PATCH 1/1] isofs: truncate oversized Rock Ridge names to 255 bytes Enlarge the limit for name bytes from 253 to 255. Do not discard all bytes of the NM field where the overflow occurs, but rather append them to the accumulated name before truncating it to exactly 255 bytes. Map trailing incomplete UTF-8 bytes to '_'. Signed-off-by: Thomas Schmitt <scdbackup@xxxxxxx> --- fs/isofs/rock.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++--- fs/isofs/rock.h | 2 ++ 2 files changed, 71 insertions(+), 4 deletions(-) diff --git a/fs/isofs/rock.c b/fs/isofs/rock.c index 94ef92fe806c..e1db8776b67e 100644 --- a/fs/isofs/rock.c +++ b/fs/isofs/rock.c @@ -192,6 +192,63 @@ static int rock_check_overflow(struct rock_state *rs, int sig) return 0; } +/* + * Find backward from idx the start byte of a possible UTF-8 character. + * https://en.wikipedia.org/wiki/UTF-8#Description + * Return -1 if no incomplete UTF-8 sequence is found at the name end. + */ +static int rock_find_utf8_start(char *name, int idx) +{ + unsigned char *uname, uch; + int i; + + uname = (unsigned char *)name; + /* Up to 4-byte codes */ + for (i = 0; i < 4 && idx - i >= 0; i++) { + uch = uname[idx - i]; + if ((uch & 0xe0) == 0xc0) { + /* UTF-8 start byte for 2-byte codes */ + if (i >= 1) + return -1; /* tail is complete */ + else + return (idx - i); /* tail not complete */ + } else if ((uch & 0xf0) == 0xe0) { + /* UTF-8 start byte for 3-byte codes */ + if (i >= 2) + return -1; + else + return (idx - i); + } else if ((uch & 0xf8) == 0xf0) { + /* UTF-8 start byte for 4-byte codes */ + if (i >= 3) + return -1; + else + return (idx - i); + } else if ((uch & 0xc0) != 0x80) { + /* not an UTF-8 tail byte, so no UTF-8 */ + return -1; + } + } + /* That would be an oversized UTF-8 code or no UTF-8 at all */ + return -1; +} + +/* + * Replace trailing incomplete UTF-8 sequence by '_' characters. + */ +static void rock_erase_incomplete_utf8(char *name, int len) +{ + int i; + + if (len <= 0) + return; + i = rock_find_utf8_start(name, len - 1); + if (i < 0) + return; + for (; i < len; i++) + name[i] = '_'; +} + /* * return length of name field; 0: not found, -1: to be ignored */ @@ -271,16 +328,24 @@ int get_rock_ridge_filename(struct iso_directory_record *de, break; } len = rr->len - 5; - if (retnamlen + len >= 254) { - truncate = 1; - break; - } p = memchr(rr->u.NM.name, '\0', len); if (unlikely(p)) len = p - rr->u.NM.name; + if (retnamlen + len > RR_NAME_LEN) { + truncate = 1; + /* Take as many characters as possible */ + len = RR_NAME_LEN - retnamlen; + if (len <= 0) { + rock_erase_incomplete_utf8(retname, + retnamlen); + break; + } + } memcpy(retname + retnamlen, rr->u.NM.name, len); retnamlen += len; retname[retnamlen] = '\0'; + if (truncate == 1) + rock_erase_incomplete_utf8(retname, retnamlen); break; case SIG('R', 'E'): kfree(rs.buffer); diff --git a/fs/isofs/rock.h b/fs/isofs/rock.h index 1558cf22ef8a..0938fc11ced4 100644 --- a/fs/isofs/rock.h +++ b/fs/isofs/rock.h @@ -121,3 +121,5 @@ struct rock_ridge { #define RR_PL 32 /* Parent link */ #define RR_RE 64 /* Relocation directory */ #define RR_TF 128 /* Timestamps */ + +#define RR_NAME_LEN 255 /* Maximum length in bytes of a file name */ -- 2.20.1 _______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies