On 2020/1/1 8:39 下午, Qian Cai wrote:
On Jan 1, 2020, at 4:32 AM, Wen Yang <wenyang@xxxxxxxxxxxxxxxxx> wrote:
The variables 'min', 'max' and 'bw' are unsigned long and
do_div truncates them to 32 bits, which means it can test
non-zero and be truncated to zero for division.
Fix this issue by using div64_ul() instead.
How did you find out the issue? If it is caught by compilers, can you paste the original warnings? Also, can you figure out which commit introduced the issue in the first place, so it could be backported to stable if needed?
Thanks for your comments.
There are no compilation warnings here.
We found this issue by following these steps:
We were first inspired by commit b0ab99e7736a ("sched: Fix possible
divide by zero in avg_atom () calculation"), combined with our recently
analyzed mm code, we found this suspicious place.
And we also disassembled and confirmed it:
201 if (min) {
202 min *= this_bw;
203 do_div(min, tot_bw);
204 }
/usr/src/debug/kernel-4.9.168-016.ali3000/linux-4.9.168-016.ali3000.alios7.x86_64/mm/page-writeback.c:
201
0xffffffff811c37da <__wb_calc_thresh+234>: xor %r10d,%r10d
0xffffffff811c37dd <__wb_calc_thresh+237>: test %rax,%rax
0xffffffff811c37e0 <__wb_calc_thresh+240>: je
0xffffffff811c3800 <__wb_calc_thresh+272>
/usr/src/debug/kernel-4.9.168-016.ali3000/linux-4.9.168-016.ali3000.alios7.x86_64/mm/page-writeback.c:
202
0xffffffff811c37e2 <__wb_calc_thresh+242>: imul %r8,%rax
/usr/src/debug/kernel-4.9.168-016.ali3000/linux-4.9.168-016.ali3000.alios7.x86_64/mm/page-writeback.c:
203
0xffffffff811c37e6 <__wb_calc_thresh+246>: mov %r9d,%r10d
---> truncates it to 32 bits here
0xffffffff811c37e9 <__wb_calc_thresh+249>: xor %edx,%edx
0xffffffff811c37eb <__wb_calc_thresh+251>: div %r10
0xffffffff811c37ee <__wb_calc_thresh+254>: imul %rbx,%rax
0xffffffff811c37f2 <__wb_calc_thresh+258>: shr $0x2,%rax
0xffffffff811c37f6 <__wb_calc_thresh+262>: mul %rcx
0xffffffff811c37f9 <__wb_calc_thresh+265>: shr $0x2,%rdx
0xffffffff811c37fd <__wb_calc_thresh+269>: mov %rdx,%r10
This issue was introduced by commit 693108a8a667 (“writeback: make
bdi->min/max_ratio handling cgroup writeback aware”).
Finally, we will summarize the above cases and plan to write a general
coccinelle rule to check for similar problems.
For the two variables 'numerator' and 'denominator',
though they are declared as long, they should actually be
unsigned long (according to the implementation of
the fprop_fraction_percpu() function).