On 28 March 2017 at 13:16, Karel Zak <kzak@xxxxxxxxxx> wrote: > see https://bugzilla.redhat.com/show_bug.cgi?id=1436432 > > any idea what is the right col(1) behavior for escape sequences? > > The current code reads two first bytes from the sequence and the rest > is interpreted as standard chars (because complex sequences like > ^[..m are completely unknown for the code), for example input: > > ^[[1mtomcat-el^[(B^[[m > > produces: > > 1mtomcat-elBm > > It seems incorrect. I think for "col -p" all the sequence should be > filtered out, it means: > > tomcat-el > > and the default behavior (without -p) should be output all escape > sequences but do not increment internal width counters. > > Objections? This is what Open Group[1] has to say about col(1) input handling. On input, the only control characters accepted are space, backspace, tab, carriage-return and newline characters, SI, SO, VT, reverse line-feed, forward half-line-feed and reverse half-line-feed. The VT character is an alternative form of full reverse line-feed, included for compatibility with some earlier programs of this type. The only other characters to be copied to the output are those that are printable. Last sentence is pretty clear that control characters must be removed. I am not sure if the definition was meant to include control sequences, but it feels that is the spirit of the definition. Maybe a silly question how to choose control sequences that are recognised? Maybe ECMA-48, VT100, and Unicode. [1] http://pubs.opengroup.org/onlinepubs/7908799/xcu/col.html -- Sami Kerola http://www.iki.fi/kerolasa/ -- To unsubscribe from this list: send the line "unsubscribe util-linux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html