On Sat, Oct 17, 2015 at 4:18 PM, Torsten Bögershausen <tboegi@xxxxxx> wrote: > Make it possible to show the line endings of files. > Files which are staged and/or files in the working tree: > > git ls-files --eol-staged > git ls-files --eol-worktree > > Both will show an output like this: > > empty empty_file > bin binary_file_or_with_cr_handled_as_binary > txt-crlf text_file_with_crlf > txt-mix text_file_with_crlf_and_lf > txt-lf text_file_with_lf > txt text_file_with_no_eol_at_all > > Implementation details: > Make struct text_stat, is_binary() and gather_stats() from convert.c > public, add a new function get_convert_stats_ascii() and use it > in and use them in ls-files. s/and use it in and use them in/and use them in/ > git ls-files --eol-staged will give a line like this: "... line like this:" what? > Signed-off-by: Torsten Bögershausen <tboegi@xxxxxx> > --- > --- a/builtin/ls-files.c > +++ b/builtin/ls-files.c > @@ -68,6 +70,11 @@ static void show_dir_entry(const char *tag, struct dir_entry *ent) > return; > > fputs(tag, stdout); > + if (show_eol_wt) { > + printf("%s ", get_convert_stats_ascii(ent->name, > + GET_CONVERT_STATS_ASCII_WT, 0)); Whitespace-damaged patch? > + } Style: unnecessary braces > + > write_name(ent->name); > } > > @@ -170,6 +177,14 @@ static void show_ce_entry(const char *tag, const struct cache_entry *ce) > find_unique_abbrev(ce->sha1,abbrev), > ce_stage(ce)); > } > + if (show_eol_staged) { > + printf("%s ", > + get_convert_stats_ascii(ce->name, GET_CONVERT_STATS_ASCII_BLOB, 0)); > + } > + if (show_eol_wt) { > + printf("%s ", get_convert_stats_ascii(ce->name,GET_CONVERT_STATS_ASCII_WT, > + ce->ce_stat_data.sd_size)); > + } Style: unnecessary braces (both cases) > write_name(ce->name); > if (debug_mode) { > const struct stat_data *sd = &ce->ce_stat_data; > @@ -206,6 +221,10 @@ static void show_ru_info(void) > printf("%s%06o %s %d\t", tag_resolve_undo, ui->mode[i], > find_unique_abbrev(ui->sha1[i], abbrev), > i + 1); > + if (show_eol_wt) { > + printf("%s ", > + get_convert_stats_ascii(path, GET_CONVERT_STATS_ASCII_WT, 0)); > + } Style: unnecessary braces > write_name(path); > } > } > @@ -433,6 +452,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) > OPT_BIT(0, "directory", &dir.flags, > N_("show 'other' directories' names only"), > DIR_SHOW_OTHER_DIRECTORIES), > + OPT_BOOL(0, "eol-staged", &show_eol_staged, N_("show line endings of the staged file")), > + OPT_BOOL(0, "eol-worktree", &show_eol_wt, N_("show line endings of the file in work tree")), > OPT_NEGBIT(0, "empty-directory", &dir.flags, > N_("don't show empty directories"), > DIR_HIDE_EMPTY_DIRECTORIES), > --- a/convert.c > +++ b/convert.c > @@ -95,6 +87,45 @@ static int is_binary(unsigned long size, struct text_stat *stats) > return 0; > } > > + Style: unnecessary blank line > +const char *gather_stats_ascii(const char *data, unsigned long size) Is this name too generic? This implementation is specialized for summarizing EOL state, but it is conceivable that there may be other meaningful textual representations of struct text_stat, as well, so perhaps this name ought to better reflect the EOL-centric textual representation of this implementation. > +{ > + struct text_stat stats; > + if (!data || !size) > + return("empty "); Would it make sense to distinguish between an empty file/blob and one which was not found? if (!data) return "missing"; if (!size) return "empy"; > + gather_stats(data, size, &stats); > + if (is_binary(size, &stats)) > + return("bin "); Style: unnecessary parentheses on 'return' > + else if (stats.cr != stats.crlf) Style (nit): unnecessary 'else' > + return("bin "); > + else if (stats.crlf && (stats.crlf == stats.lf)) > + return("txt-crlf"); > + else if (stats.crlf && stats.lf) > + return("txt-mix "); > + else if (stats.lf) > + return("txt-lf "); > + else > + return("txt "); > +} Is it a good idea for this otherwise general purpose function to be conflating presentation with EOL classification? I would have expected this function to return the raw classification string without trailing whitespace ("empty", "bin", "txt-mix", etc.), and leave it up to the caller to pad the string if alignment is desired. > +const char *get_convert_stats_ascii(const char *path, int flags, size_t hint) Is 'flags' an appropriate name? Presently, it's more of a "mode" than a flag. If it is a flag, then you typically would use an unsigned type. If you can't foresee any additional "flags", and this really is a "mode", then it might be clearer to split this into two functions -- one for blobs and one for files -- and callers can invoke whichever is appropriate. > +{ > + const char *ret = ""; Unnecessary initialization since all branches assign 'ret' unconditionally. > + if (flags & GET_CONVERT_STATS_ASCII_BLOB) { > + unsigned long sz; > + void *data = read_blob_data_from_cache(path, &sz); > + ret = gather_stats_ascii(data, sz); gather_stats_ascii() deals gracefully with data== NULL, so it can safely be outside the 'if (data)' conditional. Okay. > + if (data) > + free(data); POSIX says that free(NULL) is safe, so the conditional is unnecessary. > + } else if (flags & GET_CONVERT_STATS_ASCII_WT){ Style: space before { > + struct strbuf sb = STRBUF_INIT; > + strbuf_read_file(&sb, path, hint); If gather_stats_ascii() should distinguish error and empty-file cases, then checking the return value of strbuf_read_file() would be appropriate. > + ret = gather_stats_ascii(sb.buf, sb.len); > + strbuf_release(&sb); > + } > + return ret; > +} > + > --- a/convert.h > +++ b/convert.h > @@ -31,6 +31,20 @@ enum eol { > +struct text_stat { > + /* NUL, CR, LF and CRLF counts */ > + unsigned nul, cr, lf, crlf; > + > + /* These are just approximations! */ > + unsigned printable, nonprintable; > +}; > +void gather_stats(const char *buf, unsigned long size, struct text_stat *stats); > +int is_binary(unsigned long size, struct text_stat *stats); > +const char *gather_stats_ascii(const char *buf, unsigned long size); > +#define GET_CONVERT_STATS_ASCII_BLOB (1<<0) > +#define GET_CONVERT_STATS_ASCII_WT (1<<1) > +const char *get_convert_stats_ascii(const char *path, int isBlob, size_t hint); s/isBlob/flags/ or s/isBlob/mode/ or something. Having read the source code, I know what 'hint' is for, but its purpose is not obvious when just reading this prototype. Also, do you expect 'hint' to prove worthwhile or might it be a case of premature optimization and interface complexity? (Genuine question.) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html