On Wed, Apr 17, 2024 at 4:10 AM Jan Kara <jack@xxxxxxx> wrote: > > On Thu 18-01-24 10:19:53, Zach O'Keefe wrote: > > (struct dirty_throttle_control *)->thresh is an unsigned long, but is > > passed as the u32 divisor argument to div_u64(). On architectures where > > unsigned long is 64 bytes, the argument will be implicitly truncated. > > > > Use div64_u64() instead of div_u64() so that the value used in the "is > > this a safe division" check is the same as the divisor. > > > > Also, remove redundant cast of the numerator to u64, as that should > > happen implicitly. > > > > This would be difficult to exploit in memcg domain, given the > > ratio-based arithmetic domain_drity_limits() uses, but is much easier in > > global writeback domain with a BDI_CAP_STRICTLIMIT-backing device, using > > e.g. vm.dirty_bytes=(1<<32)*PAGE_SIZE so that dtc->thresh == (1<<32) > > > > Fixes: f6789593d5ce ("mm/page-writeback.c: fix divide by zero in bdi_dirty_limits()") > > Cc: Maxim Patlasov <MPatlasov@xxxxxxxxxxxxx> > > Cc: <stable@xxxxxxxxxxxxxxx> > > Signed-off-by: Zach O'Keefe <zokeefe@xxxxxxxxxx> > > I've come across this change today and it is broken in several ways: Thanks for picking up on this, Jan. > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > > index cd4e4ae77c40a..02147b61712bc 100644 > > --- a/mm/page-writeback.c > > +++ b/mm/page-writeback.c > > @@ -1638,7 +1638,7 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc) > > */ > > dtc->wb_thresh = __wb_calc_thresh(dtc); > > dtc->wb_bg_thresh = dtc->thresh ? > > - div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0; > > + div64_u64(dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0; > > Firstly, the removed (u64) cast from the multiplication will introduce a > multiplication overflow on 32-bit archs if wb_thresh * bg_thresh >= 1<<32 > (which is actually common - the default settings with 4GB of RAM will > trigger this). [..] True, and embarrassing given I was looking at this code with a 32-bit focus. Well spotted. > [..] Secondly, the div64_u64() is unnecessarily expensive on > 32-bit archs. We have div64_ul() in case we want to be safe & cheap. A last-minute change vs just casting the initial "dtc->thresh ?" check. It did look expensive, but figured its existence implied it should be used. I must have missed div64_ul(). > Thirdly, if thresholds are larger than 1<<32 pages, then dirty balancing is > going to blow up in many other spectacular ways - consider only the > multiplication on this line - it will not necessarily fit into u64 anymore. > The whole dirty limiting code is interspersed with assumptions that limits > are actually within u32 and we do our calculations in unsigned longs to > avoid worrying about overflows (with occasional typing to u64 to make it > more interesting because people expected those entities to overflow 32 bits > even on 32-bit archs). Which is lame I agree but so far people don't seem > to be setting limits to 16TB or more. And I'm not really worried about > security here since this is global-root-only tunable and that has much > better ways to DoS the system. > > So overall I'm all for cleaning up this code but in a sensible way please. > E.g. for these overflow issues at least do it one function at a time so > that we can sensibly review it. > > Andrew, can you please revert this patch until we have a better fix? So far > it does more harm than good... Thanks! Shall we just roll-forward with a suitable fix? I think all the original code actually "needed" was to cast the ternary predicate, like: ---8<--- diff --git a/mm/page-writeback.c b/mm/page-writeback.c index fba324e1a010..ca1bfc0c9bdd 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1637,8 +1637,8 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc) * at some rate <= (write_bw / 2) for bringing down wb_dirty. */ dtc->wb_thresh = __wb_calc_thresh(dtc); - dtc->wb_bg_thresh = dtc->thresh ? - div64_u64(dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0; + dtc->wb_bg_thresh = (u32)dtc->thresh ? + div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0; /* * In order to avoid the stacked BDI deadlock we need ---8<--- Thanks, and apologize for the inconvenience Zach > Honza > -- > Jan Kara <jack@xxxxxxxx> > SUSE Labs, CR