Re: [PATCH] fix UTF-8 issues in read() builtin

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 08, 2010 at 01:26:15AM +0400, Alexey Zinovyev wrote:
> Hello, I think there is a bug in read() builtin.

> $ cat test
> echo 'ρ'|while read i; do echo $i; done
> $ dash test

> $ bash test
> ρ

> Same with some japanese symbols.
> Looks like dash strips 0x81 byte. 

0x81 == CTLESC, the escape character in dash's internal representation.

> diff --git a/src/miscbltin.c b/src/miscbltin.c
> index 5ab1648..f8c5655 100644
> --- a/src/miscbltin.c
> +++ b/src/miscbltin.c
> @@ -101,7 +101,6 @@ readcmd_handle_line(char *line, char **ap, size_t len)
>  			 * will not modify the length of the string */
>  			offset = sl->text - s;
>  			remainder = backup + offset;
> -			rmescapes(remainder);
>  			setvar(*ap, remainder, 0);
>  
>  			return;

This patch is not correct as it will leave 0x81 bytes for backslash
escapes. That is probably a bit worse than ignoring the backslashes
entirely, which is what it does now. It attempts to "escape" the next
character by placing a CTLESC, but CTLESC does not and should not escape
IFS characters for ifsbreakup(); the recordregion() mechanism should be
used for that.

(For the intermediate representation generated by parser.c, CTLESC does
escape IFS characters. This is not ideal as it prevents IFS splitting
with CTL* bytes in word in ${var+-word}.)

The patch I posted separately fixes the handling of 0x81 and various
other issues with read (by using separate code instead of trying to use
expand.c). Backslash escaping works too although I have just found some
bugs with corner cases.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux