A plea for more tooling usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Preamble: you might want to read the decent formatted version of this mail at:
> https://gist.github.com/daniel-j-h/06f16015c89bec4fbb42


Motivation
----------

I tried to build Ceph with the Clang (3.4) compiler. The only issue
preventing the build to finish was a "VLA of non-POD type" usage which
I fixed here:

https://github.com/ceph/ceph/pull/1768


While the build was resuming I was wondering why no one stumpled upon this
issue in the first place. Has nobody ever tried to compile Ceph with Clang?

So I did what I'm always doing to quick-check a project's code quality:
I set the compiler to Clang. I set flags to -Weverything.

The idea is to gradually disable warnings one by one that are not interesting.

Instead of doing this by hand I used the following flags that I
accumulated over the last weeks or month to disable a handful of not so
important warnings:

> -Weverything -Wno-padded -Wno-c++11-long-long -Wno-variadic-macros
> -Wno-c++11-extensions -Wno-global-constructors -Wno-exit-time-destructors
> -Wno-sign-conversion -Wno-weak-vtables -Wno-disabled-macro-expansion
> -Wno-c99-extensions -Wno-shadow -Wno-documentation-unknown-command -Wno-undef
> -Wno-documentation -Wno-duplicate-enum -Wno-switch-enum -Wno-unused-parameter
> -Wno-packed

Let's build Ceph and have a look at what Clang's -Weverything catches:

    ./autogen.sh
    ./configure --without-libxfs
    make -j 1 2> diagnostics.txt

Many warnings are duplicates because of header includes and the like.
Let's remove duplicates and only keep unique diagnostics:

    $ egrep "\[-.*\]" diagnostics.txt | sort | uniq > uniq.txt
    $ wc -l uniq.txt 
    12286 uniq.txt

Wow. 12286 unique diagnostics. Interested in what showed up?


How bad is it
-------------

I ended up going through the diagnostic categories, because I had a few hours
to spare. The methodology I used was the following:

Check for diagnostic categories, prioritize them, and after that remove all
the occurences in the original diagnostics file by:

    $ sed -i '/Wflag/d' uniq.txt

In the following I'll give you a quick summary of the most important
diagnostics. After this there are still some diagnostics left.

    $ wc -l uniq.txt 
    3362 uniq.txt

This can be explained by diagnostics that are not important at all, such as
unused exception parameter in a catch clause.

But there are also conversions between types that have to be studied one by
one to check if a loss in precision or comparison of types with different
signedness could lead to dangerous behavior.


Note: Clang's diagnostics can be customized, too:

http://clang.llvm.org/docs/UsersManual.html#formatting-of-diagnostics


So let's look through the categories.


Undefined Behavior
------------------

> warning: 'va_start' has undefined behavior with reference types [-Wvarargs]

    $ grep Wvarargs uniq.txt | wc -l
    1

Here's the paragraph from the C++11 Standard, 18.10 §3:
> If the parameter parmN is declared with a function, array, or reference type, or with a type that is not compatible with the type that results when passing an argument for which there is no parameter, the behavior is undefined.

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf


Conditional Uninitialized
-------------------------

> warning: variable '%' may be uninitialized when used here [-Wconditional-uninitialized]

    $ grep Wconditional-uninitialized uniq.txt | wc -l
    4


Unreachable Code
----------------

> warning: will never be executed [-Wunreachable-code]

    $ grep Wunreachable-code uniq.txt | wc -l
    155


Extensions to the C++ language
------------------------------

There's this vast amount of code that is definitely not conforming to
standard C++. Some extensions are used, such as Variable Length Arrays (VLAs,
which were the reason for my issue in the first place), some GCC specificas.

You probably have to decide if you want to go this route or if you want to do
it the portable and standard C++ way.

VLAs are a good example where the standard C++ way of doing it is in no way
harder to accomplish. See the following explanation:

http://clang.llvm.org/compatibility.html#vla


> warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]

    $ grep Wnested-anon-types uniq.txt  | wc -l
    28

> warning: zero size arrays are an extension [-Wzero-length-array]

    $ grep Wzero-length-array uniq.txt | wc -l
    8134

> warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]

    $ grep Wgnu-anonymous-struct uniq.txt  | wc -l
    2

> warning: '%' may not be nested in a struct due to flexible array member [-Wflexible-array-extensions]

    $ grep Wflexible-array-extensions uniq.txt | wc -l
    2

> warning: variable length array used [-Wvla]
> warning: variable length arrays are a C99 feature [-Wvla-extension]

    $ grep Wvla uniq.txt | wc -l
    178

> warning: cast between pointer-to-function and pointer-to-object is an extension [-Wpedantic]
> warning: use of non-standard escape character '\%' [-Wpedantic]

    $ grep Wpedantic uniq.txt | wc -l
    15

> warning: use of GNU old-style field designator extension [-Wgnu-designator]

    $ grep Wgnu-designator uniq.txt | wc -l
    43

> warning: anonymous unions are a C11 extension [-Wc11-extensions]

    $ grep Wc11-extensions uniq.txt | wc -l
    3


Miscellaneous:
--------------

> warning: '%' hides overloaded virtual function [-Woverloaded-virtual]

    $ grep Woverloaded-virtual uniq.txt | wc -l
    3

> warning: using namespace directive in global context in header [-Wheader-hygiene]

    $ grep Wheader-hygiene uniq.txt | wc -l
    87

> warning: extra ';' after member function definition [-Wextra-semi]

    $ grep Wextra-semi uniq.txt | wc -l
    135

> warning: struct '%' was previously declared as a class [-Wmismatched-tags]

    $ grep Wmismatched-tags uniq.txt | wc -l
    93

> warning: missing field '%' initializer [-Wmissing-field-initializers]

    $ grep Wmissing-field-initializer uniq.txt | wc -l
    3

> warning: private field '%' is not used [-Wunused-private-field]

    $ grep Wunused-private-field uniq.txt | wc -l
    11

> warning: function '%' could be declared with attribute 'noreturn' [-Wmissing-noreturn]

    $ grep Wmissing-noreturn uniq.txt | wc -l
    23

> warning: token pasting of ',' and __VA_ARGS__ is a GNU extension [-Wgnu-zero-variadic-macro-arguments]

    $ grep Wgnu-zero-variadic-macro-arguments uniq.txt | wc -l
    4



Style
-----

While fixing my initial VLA issue I skimmed through the relevant code
in src/rgw/rgw_main.cc.

What I noticed is that already in this file there's no consistent style.
There's a "using namespace std;" at the top, then in the file there's an
alternate usage between vector<string> and std::vector<std::string>.

It also seems to me that the memory management is often done by hand, e.g.
in line 377 "string *objs = new string[num_objs];" and later the "delete []"
instead of just using a std::vector<std::string>.


Summary
-------

Using Clang-3.4's -Weverything reveals critical issues.

What I hope for Ceph's general code quality is to improve by using the tools
at hand. This means especially:

Do the same check with an up to date Clang (that's one of the reasons I'm not
pointing out all the issues or the concrete source code positions), look
through them in detail, such as conversions and left-out diagnostics, and fix
as many as critical issues as possible.

You probably could even use Clang's -fsanitize flag to check for issues at runtime.
http://clang.llvm.org/docs/UsersManual.html#controlling-code-generation

Also keep in mind that this check was only done on the userspace code in the
ceph/ceph repository.


Thanks,
Daniel J. Hofmann
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux