Re: Cleaning up elkscmd and adding help text - Questions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Wow i really belive that writing this took some time... Thank you, this gave me a lot of hints, do's and don'ts.

I read The Art of Unix Programming, but you showed me that i forgot a lot ;)

I think I'm going to rewrite my patches with all that things in mind, will send them soon.

I hope my submissions will be ready very soon.

Nils





On 03/27/2015 02:19 PM, Jody Bruchon wrote:
On 3/27/2015 4:19 AM, Nils Stec wrote:
I wrote a patch for cat which shows the user some usage-information.
Jody Bruchon used write() for this purpose, other tools are using
fprintf(stderr, ....).
I don't know which i should prefer. In my opinion i would rather use
fprintf.
STDERR should be the output for operations, right?

Some of the programs already used write() and write() avoids dragging the code in for the printf family of functions (all ELKS programs are statically linked), so if there is NO usage of the printf family of functions in the program, it may make sense to use write() to minimize total program size. That being said, even a trivial usage message should mention the program being run in the usage, and it's not a good idea to assume that i.e. 'ls' is ALWAYS run as 'ls' since someone could symlink or alias it in the future. Because of this,

fprintf(stderr, "usage: %s args\n", argv[0]);

is a thousand times better than:

write(STDERR_FILENO, "usage: ", 7);
write(STDERR_FILENO, argv[0], strlen(argv[0]));
write(STDERR_FILENO, "\n", 1);

This results in three calls to write() and a call to strlen() when we could have just used one fprintf() to do all the work. In the future, I think many calls to write() will be eliminated, since even one call to a printf family function brings in the core code for all printf functions anyway.

Is this the right way to write code for this project? It's the first
time I'm doing this in here and i don't wan't to do things wrong...

The cat patch should use fprintf(stderr, ...) since I've already brought in the printf function family with my error_read: code and write() is bad for the reasons outlined above.

A few other comments:

+#define NOT_OVERWRITING        0xABCD
...snip...
+            return NOT_OVERWRITING;

If the user asks you to not overwrite (clobber), you should silently not clobber files and not consider it an error. If any program succeeds in doing what the user asked for, it should return success (0) with no messages. If you haven't read Eric S. Raymond's rules of the Unix philosophy, take a look at them; I find them to be very helpful guidelines when making program design decisions.

+    if (stat(srcname, &statbuf1) < 0) {    // src nonexistent

You can't write comments using // unless you're using a C99 compiler. bcc is a K&R C compiler that is not even guaranteed to be C89 compliant, so no // comments. Use /* comment */ instead. Source comments are nice to have as long as they're not excessive or explaining something that is very obvious.

+static int option_no_overwrite;
+static int option_verbose;
+static int option_interactive;

Depending on how many options you add, you may want to use one global static int called 'flags' or similar, and use bit flags instead of a bunch of ints. From the program 'fdupes' in one of my GitHub repos, snipped up for brevity:

#define ISFLAG(a,b) ((a & b) == b)
#define SETFLAG(a,b) (a |= b)
#define F_RECURSE 0x0001
#define F_HIDEPROGRESS 0x0002
etc. etc.
...
switch (opt) {
case 'q':
  SETFLAG(flags, F_HIDEPROGRESS);
  break;
...
if (ISFLAG(flags, F_RECURSEAFTER)) <code here>...
if (!ISFLAG(flags, F_HIDEPROGRESS)) <code here>...

I don't necessarily recommend using this code, it's just an example and it's written to be usable with more than one variable that holds a set of flags. I'd rewrite it like this for ELKS programs to make it cleaner:

static int flags;
#define ISFLAG(a) ((flags & a) == a)
#define SETFLAG(a) (flags |= a)
#define F_QUIET 0x0001
etc.
...
if (strcmp(argv[i], "-q")) SETFLAG(F_QUIET);
if (!ISFLAG(F_QUIET)) fprintf(stderr, "copied %d files\n", count);

---
+    if(option_verbose) {
+        printf("copied '%s' -> '%s'\n", srcname, destname);
+    }

Firstly, the output from GNU 'cp' looks like this:

$ cp -v foo/* bar/
'foo/1' -> 'bar/1'
'foo/2' -> 'bar/2'
'foo/3' -> 'bar/3'

I'd also prefer one-liners such as this with no multi-line 'else' clause to be collapsed together (it makes the code easier to read and avoids 'brace hell'). I also prefer that if, for, and while statements have a space between the word and the (condition) part for the sake of code readability. Write it like this instead:

if (option_verbose) printf("'%s' -> '%s'\n", srcname, destname);

Next:

+    // set defaults for options //
+    option_no_overwrite = 0;
+    option_verbose = 0;
+    option_interactive = 0;

The comment is unnecessary since it's obvious that "options are being initialized to zero" by the code itself. This is self-commenting code. I'd also prefer to see the variable names shortened a bit ('o_' or 'opt_' instead of 'option_' for example) and as mentioned previously, it's probably better to use the bit flags method mentioned previously if you're only using the variables as a yes/no flag.

+    arg_counter = 1;    // start at argv[1]
+    options_counter = 0;
+    while(arg_counter < argc) {
+        if(!strcmp(argv[arg_counter], "-n")) {
+            options_counter++;
+            option_no_overwrite = 1;
+        }
+        if(!strcmp(argv[arg_counter], "-v")) {
+            options_counter++;
+            option_verbose = 1;
+        }
+        if(!strcmp(argv[arg_counter], "-i")) {
+            options_counter++;
+            option_interactive = 1;
+        }
+        if(!strcmp(argv[arg_counter], "-h")) goto usage;
+        arg_counter++;
+    }

-    if ((argc > 3) && !dirflag) {
-        fprintf(stderr, "%s: not a directory\n", lastarg);
+ if((argc > (3+options_counter)) && !dirflag) { // can't copy more than one src-files into one dst-file

Ouch. Rather than try to explain the problems in detail, let me rewrite it.

    arg_cnt = 1;    // start at argv[1]
    opt_cnt = 0;
    while(arg_cnt < argc) {
        if (!strcmp(argv[arg_cnt], "-n")) {
            opt_noclobber = 1;
        } else if (!strcmp(argv[arg_cnt], "-v")) {
            opt_verbose = 1;
        } else if (!strcmp(argv[arg_cnt], "-i")) {
            opt_interactive = 1;
        } else if (!strcmp(argv[arg_cnt], "-h")) goto usage;
        else break;
        opt_cnt++;
        arg_cnt++;
    }

    /* If multiple files specified, last one must be a directory */
    if ((argc > (opt_cnt + 3)) && !dirflag) {

This rewrite combines all the option count increments into one instance which reduces code size and stops parsing options at the first non-option string. The comparison on the last line is more readable and it is clearer that it is "comparing total argument count to the number of options specified plus 3." The 'if-elseif-else' tree stops processing once a valid option is hit and "falls through" once a non-option is hit. It could (and probably should) be improved by handling the combined option case such as 'cp -iv src dest' but this is sufficient for now as long as the usage text implies that option flags must be specified separately.

+    copy_count = 0;
+    file_count = 0;
+    while (argc-- > (2+options_counter)) {
+        srcname = argv[options_counter+1+file_count];

I've spent a lot of time cleaning these compressed mathematical expressions out of the elkscmd code; let's not introduce more of them (also shorten variable names while retaining their meaning, this is a good example for why to use shorter names):

+    copy_cnt = 0;
+    file_cnt = 0;
+    while (argc-- > (opt_cnt + 2)) {
+        srcname = argv[opt_cnt + 1 + file_cnt];

Next,

-        if (dirflag) destname = buildname(destname, srcname);
+        if(dirflag) destname = buildname(destname, srcname);

Don't do this, if no other reason than the fact that 'if' is not a function call. I understand that it comes down to individual coding style and the compiler doesn't care if you write

if((something+(q/2))==argv[x+2+y*z[aaa]])usage();

but it is hard to read and readability is very important. I believe that 'if', 'while', and 'for' shouldn't look like function calls in code since they are actually C control statements. Using a space makes those control statements immediately obvious, whereas if() and while() and for() look like function calls when skimming the code.

-        if (!copyfile(*++argv, destname, 0)) goto error_copy;
+        retval = copyfile(srcname, destname, 0);
+        if((!retval) && (!(retval == NOT_OVERWRITING))) goto error_copy;

Remember the rule about silent success mentioned earlier? This change can be eliminated because of it. Even if it wasn't, though, this code could be simplified and made more readable in one shot:

+        retval = copyfile(srcname, destname, 0);
+        if (retval != 1) goto error_copy;

I'd actually recommend reversing the return values in copyfile(), using nonzero as an error return instead (most C library calls seem to use this convention and consistency with that is rarely a bad thing) so you could just go:

+        if (retval) goto error_copy;

Next,

+    if(option_verbose) printf("copy-count = %d\n", copy_count);

This shouldn't be done. Here's GNU coreutils 'cp' in action:

file_utils $ cp -v foo/* bar/
'foo/COPYING' -> 'bar/COPYING'
'foo/COPYING.sash' -> 'bar/COPYING.sash'
'foo/Makefile' -> 'bar/Makefile'
'foo/README' -> 'bar/README'
'foo/WARRANTY' -> 'bar/WARRANTY'
...snip...

If someone is interested in knowing a "copy-count" for some reason, they'll almost certainly do this instead:

file_utils $ cp -v foo/* bar/ | wc -l
46

Your helpful message will throw this off and make it not work as expected. A shell script author will have to introduce an incompatible kludge to their script to make up for 'cp' misbehaving, i.e.

CNT=$(cp -v $A/* $B/ | wc -l)
CNT=$((CNT - 1))  # This shouldn't be necessary...

Another problem is that you've declared a static global 'copy_count' variable and you're introducing a local 'copy_count' variable as well. It needs to be called something else for the sake of avoiding programmer confusion. However, assuming you eliminate the "copy-count" message I just talked about along with its local variable counterpart, the global 'static unsigned int copy_count' is set (copy_count++) but not used throughout the entire program and can be discarded.

If you ever want an "official" reference for how a particular elkscmd program "should" behave, take a look at the POSIX standards:

http://pubs.opengroup.org/onlinepubs/9699919799/

Whew, that took me a long time to write. I hope it's helpful to you and everyone else on the list. Thanks for your impressive work so far! I'm looking forward to your finalized [PATCH] submissions. :-)

-Jody
--
To unsubscribe from this list: send the line "unsubscribe linux-8086" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-8086" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Kernel]     [Linux ia64]     [DCCP]     [Linux for ARM]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux