Re: False positive and false negative from -Wformat-truncation

Jeff Law <law@xxxxxxxxxx> · Mon, 16 Oct 2017 13:13:57 -0600

On 10/16/2017 11:45 AM, Martin Sebor wrote:
> On 10/16/2017 06:28 AM, Mason wrote:
>> Hello,
>>
>> I do note that the documentation for -Wformat-truncation states:
>> "While enabling optimization will in most cases improve the accuracy
>> of the warning, it may also result in false positives."
>>
>>
>> However, consider the following test cases:
> 
> The implementation of the warning uses the get_range_info() function
> to determine the range of sprintf arguments and its ability to avoid
> false positives and negatives is subject to the accuracy of the range
> information made available by this API.  (See the output of
> -fdump-tree-printf-return-value for the ranges the warning pass
> sees, and the bugs in Bugzilla tracking these issues.)  This is
> a general problem that affects all clients of the get_range_info()
> function.
> 
> It is possible to do better, for instance by avoiding the API and
> determining the range of each argument by traversing the control
> flow graph.  Each affected warning (and optimization) pass could
> do this itself but since Andrew MacLeod and Aldy Hernandez are
> working on a general API to do just that it seems like making use
> of their feature when it's available and avoiding the duplication
> of effort and code is a better way to go.
> 
> As an aside, another option to improve the accuracy would be to
> integrate all these passes into VRP where the range info is fully
> accurate.  But form a design standpoint this doesn't seem like as
> good an approach to me because it would increase coupling and
> bloat the VRP pass with even more logic.  In addition, warnings
> like -Wformat-truncation would also benefit from better string
> length information and so the same arguments could be made both
> for and against integrating them into the tree-ssa-strlen pass
> (though I suspect the string length integration will at some
> point be necessary, unless some general purpose
> get_string_length_range()-like API is developed first).
> 
>> $ cat testcase.c
>> #include <stdio.h>
>> typedef unsigned long long u64;
>> void foo(u64 arg, char *buf)
>> {
>>     u64 u = arg / 1000000000;
>>     if (u >= 1000) return;
>>     snprintf(buf, 16, "%llu.XXXZZZ", u);
>> }
>>
>> $ gcc-7 -O3 -Wall -S testcase.c
>> testcase.c: In function 'foo':
>> testcase.c:7:20: warning: '.XXXZZZ' directive output may be truncated
>> writing 7 bytes into a region of size between 5 and 15
>> [-Wformat-truncation=]
>>   snprintf(buf, 16, "%llu.XXXZZZ", u);
>>                     ^~~~~~~~~~~~~
>> In file included from /usr/include/stdio.h:937:0,
>>                  from testcase.c:1:
>> /usr/include/x86_64-linux-gnu/bits/stdio2.h:64:10: note:
>> '__builtin___snprintf_chk' output between 9 and 19 bytes into a
>> destination of size 16
>>    return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
>>           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>         __bos (__s), __fmt, __va_arg_pack ());
>>         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>>
>> Because we only call snprintf when u < 1000, it is obvious to a human
>> reader that the output cannot overflow the buffer.
> 
> Right.  Unfortunately, it's not what the warning pass sees.
> The same limitation, incidentally, also triggers false positives
> in other warnings.  For instance, with -Walloca-larger-than=999
> in this test case:
> 
>   void bar (void*);
> 
>   void foo(u64 arg)
>   {
>     u64 u = arg / 1000000000;
>     if (u >= 1000) return;
>     bar (__builtin_alloca (u));
>   }
> 
> we get:
> 
>   warning: argument to ‘alloca’ may be too large [-Walloca-larger-than=]
>      bar (__builtin_alloca (u));
>      ^~~~~~~~~~~~~~~~~~~~~~~~~~
>   note: limit is 1000 bytes, but argument may be as large as 18446744073
> 
>> Strangely, the following test case does not warn.
>>
>> $ cat testcase2.c
>> #include <stdio.h>
>> typedef unsigned long long u64;
>> void foo(u64 arg, char *buf)
>> {
>>     snprintf(buf, 16, "%llu.XXXZZZ", arg);
>> }
>>
>> Here, a warning would be quite useful... But none is given.
> 
> The difference between these two cases is that in the first one
> the argument is known to be bounded by a range (unfortunately,
> the range exposed by get_range_info() here appears to be less
> than accurate), while in the second one the argument has no
> range associated with it.  The heuristic the warning uses is
> that a variable constrained to any range triggers the checking
> (on the basis that the programmer tries to reduce its range but
> didn't do good enough of a job), while one that's not does not
> (the constraint may be in a different function in another
> translation unit).
So following-up.

If I convert the sprintf bits to a domwalk and embed an evrp style class
to compute the range data.  I get this at the key point:

(gdb) p info.callstmt
$18 = (gimple *) 0x7ffff01ca498
(gdb) p debug_gimple_stmt ($18)
# .MEM_6 = VDEF <.MEM_4(D)>
snprintf (buf_5(D), 16, "%llu.XXXZZZ", u_3);
$19 = void
(gdb) p dump_all_value_ranges (stderr)
arg_2(D): [0, 999999999999]  EQUIVALENCES: { arg_2(D) } (1 elements)
u_3: [0, 999]

Note that this range data is the form used within tree-vrp and is not
attached to the SSA_NAME.  So it's not visible to get_range_info.  But
we can clearly see that it's possible to get good tight range info for u
at the call to sprintf.

Jeff