On 10/12/2017 06:36 AM, Sitsofe Wheeler wrote: > On 12 October 2017 at 07:06, Sitsofe Wheeler <sitsofe@xxxxxxxxx> wrote: >> On 12 October 2017 at 06:28, Andrzej Jakowski >> <andrzej.jakowski@xxxxxxxxx> wrote: >>> On Wed, Oct 11, 2017 at 4:36 PM, Vincent Fu <vincentfu@xxxxxxxxx> wrote: >>>> Andrzej, does this actually resolve the problem you observed? unsigned long >>>> nr in the parameter list of calc_clat_percentiles() and >>>> show_clat_percentiles() also looks like it will overflow when you have more >>>> than 2^32 IOs. >>> Yes you are right. That original patch is not sufficient. Will get it >>> fixed and post new version soon. >> >> I've got a feeling there are few other 32 bit platform integer >> overflows lurking around. I'm trying a clang sanitizer run... > > The following results are from a 32 bit Linux build without your patch: > gettime.c:313:28: runtime error: unsigned integer overflow: 3600 * > 2600730 cannot be represented in type 'unsigned long' > gettime.c:353:53: runtime error: unsigned integer overflow: > 8796093022208 * 12901928 cannot be represented in type 'unsigned long > long' > overflow: (g=0): rw=read, bs=(R) 32B-32B, (W) 32B-32B, (T) 32B-32B, > ioengine=null, iodepth=32 > ... > fio-3.1-50-g2b6f > Starting 4 threads > io_u.c:1272:28: runtime error: unsigned integer overflow: 0 - 1 cannot > be represented in type 'unsigned int' > gettime.c:197:35: runtime error: unsigned integer overflow: > 1429766471724 * 12901928 cannot be represented in type 'unsigned long > long' > backend.c:345:7: runtime error: unsigned integer overflow: 4294967295 > + 1 cannot be represented in type 'unsigned int' > io_u.c:989:23: runtime error: unsigned integer overflow: 4294967288 + > 8 cannot be represented in type 'unsigned int' > io_u.c:1071:24: runtime error: unsigned integer overflow: 4294967295 + > 1 cannot be represented in type 'uint32_t' (aka 'unsigned int') > Jobs: 4 (f=4): [R(4)][26.9%][r=102MiB/s,w=0KiB/s][r=3358k,w=0 IOPS][eta 43m:52s] > > ^C > fio: terminating on signal 2 > ioengines.c:270:22: runtime error: unsigned integer overflow: 0 - 7 > cannot be represented in type 'unsigned int' > stat.c:1452:18: runtime error: unsigned integer overflow: 12246275151 > * 12220497954 cannot be represented in type 'unsigned long long' > stat.c:1529:20: runtime error: unsigned integer overflow: 3656340552 + > 3630563360 cannot be represented in type 'unsigned int' > stat.c:1537:22: runtime error: unsigned integer overflow: 3002030688 + > 2989903763 cannot be represented in type 'unsigned int' > stat.c:1531:23: runtime error: unsigned integer overflow: 3058346638 + > 1520207785 cannot be represented in type 'unsigned int' > stat.c:1533:25: runtime error: unsigned integer overflow: 3058346638 + > 1520207785 cannot be represented in type 'unsigned int' > > (I've manually removed most of the warnings that look like unsigned > wraparound was expected/desired such as when hashing is being done) Yikes > The first gettime.c problem looks like it's easily fixed with "static > unsigned long long cycles_per_msec". > The next gettime.c:353:53 looks more problematic. That value is > gigantic and would need a 128 bit variable to fit. I don't know if > this is spurious because I'm running a VM and manually set > --clocksource=cpu but it would be nice to know for sure. > io_u.c:1272 looks like the standard decrementing counter whose value > is taken before being negated so it's a harmless warning. > backend.c:345:7 I think ts_cache_nr is expected to just keep wrapping > around so this is harmless? > io_u.c:989:23 io_u_map is a uint32_t which means we can only store > 2^32 latency entries before we wraparound - this sounds problematic > but I don't know how to handle it (periodically scale everything?). > io_u.c:1071:24 see above but for io_u_lat_u . > stat.c:1452:18 - too many samples means (no pun intended) we have an overflow. > The rest of the stat.c errors appear to be down to to trying to > combine samples when both values are already close to their 32 bit > value maximums and the result can't be kept in 32 bits. > > Any thoughts? Just go through them one-by-one. I think some of these were likely introduced with the nsec conversion, which I guess isn't too surprising... In any case, just fix them up individually, that's much better than trying to do one big patch that fixes everything. Or create a mail for each particular case that you don't see an easy fix for, then we can go over them. The easy case are just making counters or values 64-bit, regardless of platform. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html