On Wednesday 09 May 2012, Dave Chinner wrote: > > In low end flash devices, some requests might take too long than normal > > due to background device maintenance (i.e flash erase / reclaim procedure) > > kicking in in the context of an ongoing write, stalling them by several > > orders of magnitude. > > And thereby stalling what might be writes critical to operation. > Indeed, how does this affect the system when it starts swapping > heavily? If you keep stalling writes, the system won't be able to > swap and free memory... The point here is that reads have a consistent latency, e.g. 500 microseconds for a small access, while writes have a latency that can easily become 1000x the read latency (e.g. 500 ms of blocking the device) depending on the state of the device. Most of the time, writes are fast as well, but sometimes (when garbage collection happens in the device), they are extremely slow and block everything else. This is the only time we ever want to interrupt a write: keeping the system running interactively while eventually getting to do the writeback. There is a small penalty for interrupting the garbage collection, but the device should be able to pick up its work at the point where we interrupt it, so we can still make forward progress. > > > This really seems like functionality that belongs in an IO > > > scheduler so that write starvation can be avoided, not in high-level > > > data read paths where we have no clue about anything else going on > > > in the IO subsystem.... > > > > Indeed, the feature is built mostly in the low level device driver and > > minor changes in the elevator. Changes above the block layer are only > > about setting > > attributes and transparent to their operation. > > The problem is that the attribute you are setting covers every > single data read that is done by all users. If that's what you want > to have happen, then why do you even need a new flag at this layer? > Just treat every non-REQ_META read request as a demand paged IO and > you've got exactly the same behaviour without needing to tag at the > higher layer.... My feeling is that we should just treat every (REQ_SYNC | REQ_READ) request the same and let them interrupt long-running writes, independent of whether it's REQ_META or demand paging. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html