Hello, dash currently has a problem with output from "-x" option being non-deterministic when subshells are used (e.g. whenever pipelines are). This was reported a long time ago as a Debian bug[1] but AFAICS has never been posted here, so at the very least I'd like to do it and, ideally, also fix it. But first let me describe the bug in more details: When you run the following command: $ dash -xc 'z=`echo | cut -f1 | sort`' the following output would be expected: + echo + cut -f1 + sort + z= And, indeed, this is what you usually get. However the output is non-deterministic and you can also get weirder things, with the output from different subshells intermixed together. A simple way to see it is to run the following loop: $ while true; do dash -xc 'z=`echo | cut -f1 | sort`' 2>&1 | grep cut; done | fgrep -vx '+ cut -f1' which is not supposed to output anything, but in practice it does quickly give some output, e.g. here is what I see when running it with the latest dash version from git (48875c1201930d1e71d358eb1cf3eacc166795be): cut -f1 + + cut -f1 + cut+ -f1 + cut+ -f1 + cut+ -f1 cut -f1 + + echocut cut -f1 cut -f1 + + cutsort + cut -f1+ In practice, this is very annoying as it makes it difficult to compare the logs of the same script run using "-x" as for any non-trivial amount of output there will always be spurious differences between the output produced by the different runs. As the comment from Tomi Ollila in the Debian bug report says, the problem is due to using several different write() calls for output. Looking at the sources, this happens inside "if (xflag)" block in evalcommand() function starting at the line 854 in eval.c, where outstr(), eprintlist() twice, and outcslow() are used for the single line of output. To fix the problem we need to use only a single call instead and I see 2 possibilities here: first the obvious one which would be to output everything in a temporary memory buffer and output the entire buffer at once to preverrout at the end. This approach has some problems, however: 1. It would consume some extra memory and while this should be negligible in practice, it's arguably still better to avoid it. 2. Support for memory output is disabled using "#ifdef notyet" in the current sources and enabling it would enable quite a few other things, so I'm not sure if it's really a good idea. 3. Without enabling memory output, the code would need to be changed relatively extensively and I'd need to either refactor mkinit.c to extract "struct text" and the related addstr()/addchar() functions from it and make it possible to reuse it from eval.c, or reimplement something quite close to it, which is not appealing. The other (and the only other, AFAICS) possibility would be to use writev() to output everything at once instead. This avoids (1) (but, again, I don't think it's really worth avoiding it that much), but would also require making rather extensive changes to the code. I'm also not sure if writev() is supported on all platforms targeted by dash, but hopefully it should be, in 2019. I'd appreciate your advice about what kind of approach would be the most appropriate and most likely to be accepted. Personally, I think of enabling support for memory-backed output structs (only, i.e. without touching all the rest of the code guarded by "#ifdef notyet") and using it here, but if this is undesirable, for whatever reason, I'd go with writev() because as long as we need to change the code structure anyhow, it's not really more complicated than using a temporary buffer. Thanks in advance! VZ [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=567648
Attachment:
pgpjk1LBVGIlR.pgp
Description: PGP signature