On Mon, Sep 13, 2010 at 12:37:44AM +0900, Minchan Kim wrote: > > > > > > <SNIP> > > > > > > > > > > > > + * in sleeping but cond_resched() is called in case the current process has > > > > > > + * consumed its CPU quota. > > > > > > + */ > > > > > > +long wait_iff_congested(struct zone *zone, int sync, long timeout) > > > > > > +{ > > > > > > + long ret; > > > > > > + unsigned long start = jiffies; > > > > > > + DEFINE_WAIT(wait); > > > > > > + wait_queue_head_t *wqh = &congestion_wqh[sync]; > > > > > > + > > > > > > + /* > > > > > > + * If there is no congestion, check the amount of writeback. If there > > > > > > + * is no significant writeback and no congestion, just cond_resched > > > > > > + */ > > > > > > + if (atomic_read(&nr_bdi_congested[sync]) == 0) { > > > > > > + unsigned long inactive, writeback; > > > > > > + > > > > > > + inactive = zone_page_state(zone, NR_INACTIVE_FILE) + > > > > > > + zone_page_state(zone, NR_INACTIVE_ANON); > > > > > > + writeback = zone_page_state(zone, NR_WRITEBACK); > > > > > > + > > > > > > + /* > > > > > > + * If less than half the inactive list is being written back, > > > > > > + * reclaim might as well continue > > > > > > + */ > > > > > > + if (writeback < inactive / 2) { > > > > > > > > > > I am not sure this is best. > > > > > > > > > > > > > I'm not saying it is. The objective is to identify a situation where > > > > sleeping until the next write or congestion clears is pointless. We have > > > > already identified that we are not congested so the question is "are we > > > > writing a lot at the moment?". The assumption is that if there is a lot > > > > of writing going on, we might as well sleep until one completes rather > > > > than reclaiming more. > > > > > > > > This is the first effort at identifying pointless sleeps. Better ones > > > > might be identified in the future but that shouldn't stop us making a > > > > semi-sensible decision now. > > > > > > nr_bdi_congested is no problem since we have used it for a long time. > > > But you added new rule about writeback. > > > > > > > Yes, I'm trying to add a new rule about throttling in the page allocator > > and from vmscan. As you can see from the results in the leader, we are > > currently sleeping more than we need to. > > I can see the about avoiding congestion_wait but can't find about > (writeback < incative / 2) hueristic result. > See the leader and each of the report sections entitled "FTrace Reclaim Statistics: congestion_wait". It provides a measure of how sleep times are affected. "congest waited" are waits due to calling congestion_wait. "conditional waited" are those related to wait_iff_congested(). As you will see from the reports, sleep times are reduced overall while callers of wait_iff_congested() still go to sleep. The reports entitled "FTrace Reclaim Statistics: vmscan" show how reclaim is behaving and indicators so far are that reclaim is not hurt by introducing wait_iff_congested(). > > > > > Why I pointed out is that you added new rule and I hope let others know > > > this change since they have a good idea or any opinions. > > > I think it's a one of roles as reviewer. > > > > > > > Of course. > > > > > > > > > > > 1. Without considering various speed class storage, could we fix it as half of inactive? > > > > > > > > We don't really have a good means of identifying speed classes of > > > > storage. Worse, we are considering on a zone-basis here, not a BDI > > > > basis. The pages being written back in the zone could be backed by > > > > anything so we cannot make decisions based on BDI speed. > > > > > > True. So it's why I have below question. > > > As you said, we don't have enough information in vmscan. > > > So I am not sure how effective such semi-sensible decision is. > > > > > > > What additional metrics would you apply than the ones I used in the > > leader mail? > > effectiveness of (writeback < inactive / 2) heuristic. > Define effectiveness. In the reports I gave, I reported on the sleep times and whether the full timeout was slept or not. Sleep times are reduced while not negatively impacting reclaim. > > > > > I think best is to throttle in page-writeback well. > > > > I do not think there is a problem as such in page writeback throttling. > > The problem is that we are going to sleep without any congestion or without > > writes in progress. We sleep for a full timeout in this case for no reason > > and this is what I'm trying to avoid. > > Yes. I agree. > Just my concern is heuristic accuarcy I mentioned. > In your previous verstion, you don't add the heuristic. In the previous version, I also changed all callers to congestion_wait(). V1 simply was not that great a patch and Johannes pointed out that I wasn't measuring the scanning/reclaim ratios to see how reclaim was impacted. The reports now include this data and things are looking better. > But suddenly you added it in this version. > So I think you have any clue to add it in this version. > Please, write down cause and data if you have. > The leader has a large amount of data on how this and the other patches affected results for a good variety of workloads. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html