On Thu 02-10-14 15:52:14, Johannes Weiner wrote: > On Thu, Oct 02, 2014 at 11:01:35AM -0400, Johannes Weiner wrote: > > On Tue, Sep 30, 2014 at 01:06:22PM +0200, Michal Hocko wrote: > > > > +/** > > > > + * page_counter_limit - limit the number of pages allowed > > > > + * @counter: counter > > > > + * @limit: limit to set > > > > + * > > > > + * Returns 0 on success, -EBUSY if the current number of pages on the > > > > + * counter already exceeds the specified limit. > > > > + * > > > > + * The caller must serialize invocations on the same counter. > > > > + */ > > > > +int page_counter_limit(struct page_counter *counter, unsigned long limit) > > > > +{ > > > > + for (;;) { > > > > + unsigned long old; > > > > + long count; > > > > + > > > > + count = atomic_long_read(&counter->count); > > > > + > > > > + old = xchg(&counter->limit, limit); > > > > + > > > > + if (atomic_long_read(&counter->count) != count) { > > > > + counter->limit = old; > > > > + continue; > > > > + } > > > > + > > > > + if (count > limit) { > > > > + counter->limit = old; > > > > + return -EBUSY; > > > > + } > > > > > > Ordering doesn't make much sense to me here. Say you really want to set > > > limit < count. You are effectively pushing all concurrent charges to > > > the reclaim even though you would revert your change and return with > > > EBUSY later on. > > > > > > Wouldn't (count > limit) check make more sense right after the first > > > atomic_long_read? > > > Also the second count check should be sufficient to check > count and > > > retry only when the count has increased. > > > Finally continuous flow of charges can keep this loop running for quite > > > some time and trigger lockup detector. cond_resched before continue > > > would handle that. Something like the following: > > > > > > for (;;) { > > > unsigned long old; > > > long count; > > > > > > count = atomic_long_read(&counter->count); > > > if (count > limit) > > > return -EBUSY; > > > > > > old = xchg(&counter->limit, limit); > > > > > > /* Recheck for concurrent charges */ > > > if (atomic_long_read(&counter->count) > count) { > > > counter->limit = old; > > > cond_resched(); > > > continue; > > > } > > > > > > return 0; > > > } > > > > This is susceptible to spurious -EBUSY during races with speculative > > charges and uncharges. My code avoids that by retrying until we set > > the limit without any concurrent counter operations first, before > > moving on to implementing policy and rollback. > > > > Some reclaim activity caused by a limit that the user is trying to set > > anyway should be okay. I'd rather have a reliable syscall. > > > > But the cond_resched() is a good idea, I'll add that, thanks. > > Thinking more about it, my code doesn't really avoid that if the > speculative charges persist over the two reads, it just widens the > window a bit. But your suggestion seems indeed more readable, > although I'd invert the second branch. > > How about this delta on top? > > diff --git a/mm/page_counter.c b/mm/page_counter.c > index 4bdab1c7a057..7eb17135d4a4 100644 > --- a/mm/page_counter.c > +++ b/mm/page_counter.c > @@ -19,8 +19,8 @@ int page_counter_cancel(struct page_counter *counter, unsigned long nr_pages) > > new = atomic_long_sub_return(nr_pages, &counter->count); > > - if (WARN_ON_ONCE(new < 0)) > - atomic_long_add(nr_pages, &counter->count); > + /* More uncharges than charges? */ > + WARN_ON_ONCE(new < 0); > > return new > 0; > } > @@ -146,29 +146,29 @@ int page_counter_limit(struct page_counter *counter, unsigned long limit) > unsigned long old; > long count; > > - count = atomic_long_read(&counter->count); > /* > + * Update the limit while making sure that it's not > + * below the (concurrently changing) counter value. > + * > * The xchg implies two full memory barriers before > * and after, so the read-swap-read is ordered and > * ensures coherency with page_counter_try_charge(): > * that function modifies the count before checking > * the limit, so if it sees the old limit, we see the > - * modified counter and retry. This guarantees we > - * never successfully set a limit below the counter. > + * modified counter and retry. > */ > - old = xchg(&counter->limit, limit); > - > - if (atomic_long_read(&counter->count) != count) { > - counter->limit = old; > - continue; > - } > + count = atomic_long_read(&counter->count); > > - if (count > limit) { > - counter->limit = old; > + if (count > limit) > return -EBUSY; > - } > > - return 0; > + old = xchg(&counter->limit, limit); > + > + if (atomic_long_read(&counter->count) <= count) > + return 0; > + > + counter->limit = old; > + cond_resched(); > } > } Looks good to me. Thanks! -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html