Hi, I've attached a simple patch for a bug I found in more, which was also filed in the Debian BTS. Note the patch is needed to support multibyte UTF-8 characters which overflow the existing small buffer, but this does not fix an additional issue with corruption of the output when the buffer overflows--that needs fixing separately and I have no patch for that. An alternative approach which would avoid *all* overflow would be to dynamically allocate a minimum buffer size of 4× the number of columns (since UTF-8 is at most 4 bytes). Personally I'd go with a minimum of 6× since that's the real limit, it's just limited to 4 by the standard and it might be increased in the future. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `- GPG Public Key: 0x25BFB848 Please GPG sign your mail.
--- Begin Message ---
- To: Debian Bug Tracking System <submit@xxxxxxxxxxxxxxx>
- Subject: more: Limited line buffer length results in corrupted UTF-8 text
- From: Roger Leigh <rleigh@xxxxxxxxxx>
- Date: Tue, 27 Oct 2009 11:25:21 +0000
- Delivered-to: rleigh@xxxxxxxxxxxxx
Package: util-linux Version: 2.16.1-4 Severity: important Tags: patch Attached is a file which may be used to demonstrate the problem. With a terminal of standard 80 column width, more displays the text correctly. The longest line (11, 91 chars in 91×3=273 bytes) is correctly folded over two lines. ─────────────────┼──────────┼──────────┼─────────────┼─────────────┼─────────────────────── col 91 ↑ ⇒ ─────────────────┼──────────┼──────────┼─────────────┼─────────────┼──────────── ─────────── col 80 ↑ Now, resize the terminal width to over 85 columns, and one sees this: ─────────────────┼──────────┼──────────┼─────────────┼─────────────┼───────────────── ��───── col 85 ↑ There is a newline inserted after 85 chars, and the first byte of the following UTF-8 3-byte code is lost (replaced by \n?) leading to corruption since the following two bytes are now invalid UTF-8. Why is this happening? I believe it's partly down to #define LINSIZ 256 in text-utils/more.c, since all the UTF-8 characters are 3-byte codes, 256/3 is 85 + 1 remainder. But there's a bug in the code somewhere else as well, since not only is it flushing the buffer, it's corrupting it. Partial solution: 256 bytes for the line buffer is way too small. I'd suggest that for a modern system using UTF-8 1024 bytes would be a more sensible default, since this would allow use of at least 256 columns of 4-byte UTF-8 codes. 4096 bytes would be even safer, and since it's for a single static buffer, the increased overhead is minimal. I've built with the following patch and it does prevent the corruption. There's still the matter of corruption in the case of overflow, which still would need addressing--the increased buffer size is just hiding it rather than fixing it. It should probably only flush up to the end of the last valid UTF-8 sequence. diff -urN util-linux-2.16.1.orig/text-utils/more.c util-linux-2.16.1/text-utils/more.c --- util-linux-2.16.1.orig/text-utils/more.c 2009-07-04 00:20:07.000000000 +0100 +++ util-linux-2.16.1/text-utils/more.c 2009-10-27 11:11:32.046127972 +0000 @@ -107,7 +107,7 @@ FILE *checkf (char *, int *); #define TBUFSIZ 1024 -#define LINSIZ 256 +#define LINSIZ 4096 #define ctrl(letter) (letter & 077) #define RUBOUT '\177' #define ESC '\033' Regards, Roger -- System Information: Debian Release: squeeze/sid APT prefers unstable APT policy: (550, 'unstable') Architecture: amd64 (x86_64) Kernel: Linux 2.6.30-2-amd64 (SMP w/4 CPU cores) Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages util-linux depends on: ii dpkg 1.15.4.1 Debian package management system ii initscripts 2.87dsf-8 scripts for initializing and shutt ii install-info 4.13a.dfsg.1-5 Manage installed documentation in ii libblkid1 2.16.1-4 block device id library ii libc6 2.10.1-2 GNU C Library: Shared libraries ii libncurses5 5.7+20090803-2 shared libraries for terminal hand ii libselinux1 2.0.88-1 SELinux runtime shared libraries ii libslang2 2.2.1-1 The S-Lang programming library - r ii libuuid1 2.16.1-4 Universally Unique ID library ii lsb-base 3.2-23 Linux Standard Base 3.2 init scrip ii tzdata 2009o-2 time zone and daylight-saving time ii zlib1g 1:1.2.3.3.dfsg-15 compression library - runtime util-linux recommends no packages. Versions of packages util-linux suggests: ii console-tools 1:0.2.3dbs-66 Linux console and font utilities ii dosfstools 3.0.6-1 utilities for making and checking ii util-linux-locales 2.16.1-4 Locales files for util-linux -- no debconf informationpsql (8.5devel, server 8.4.1) WARNING: psql version 8.5, server version 8.4. Some psql features might not work. Type "help" for help. rleigh=# \pset pager off Pager usage is off. rleigh=# \l List of databases Name │ Owner │ Encoding │ Collation │ Ctype │ Access privileges ─────────────────┼──────────┼──────────┼─────────────┼─────────────┼─────────────────────── merkelpb │ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ postgres │ postgres │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ projectb │ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ rleigh │ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ rleigh-amarok │ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ sbuild-packages │ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ scratch │ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ scratch2 │ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ template0 │ postgres │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ =c/postgres ↵ │ │ │ │ │ postgres=CTc/postgres template1 │ postgres │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ =c/postgres ↵ │ │ │ │ │ postgres=CTc/postgres test │ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ test2 │ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ test3 │ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ test4 │ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ test5 │ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ testp │ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ vtest │ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ (17 rows)
--- End Message ---
Attachment:
signature.asc
Description: Digital signature