Hello, On Wed, Apr 11, 2012 at 09:22:31PM +0200, Jan Kara wrote: > > So all the metadata IO will happen thorough journaling thread and that > > will be in root group which should remain unthrottled. So any journal > > IO going to disk should remain unthrottled. > > Yes, that is true at least for ext3/ext4 or btrfs. In principle we don't > have to have the journal thread (as is the case of reiserfs where random > writer may end up doing commit) but let's not complicate things > unnecessarily. Why can't journal entries keep track of the originator so that bios can be attributed to the originator while committing? That shouldn't be too difficult to implement, no? > > Now, IIRC, fsync problem with throttling was that we had opened a > > transaction but could not write it back to disk because we had to > > wait for all the cached data to go to disk (which is throttled). So > > my question is, can't we first wait for all the data to be flushed > > to disk and then open a transaction for metadata. metadata will be > > unthrottled so filesystem will not have to do any tricks like bdi is > > congested or not. > > Actually that's what's happening. We first do filemap_write_and_wait() > which syncs all the data and then we go and force transaction commit to > make sure all metadata got to stable storage. The problem is that writeout > of data may need to allocate new blocks and that starts a transaction and > while the transaction is started we may need to do some reads (e.g. of > bitmaps etc.) which may be throttled and at that moment the whole > filesystem is blocked. I don't remember the stack traces you showed me so > I'm not sure it this is what your observed but it's certainly one possible > scenario. The reason why fsync triggers problems is simply that it's the > only place where process normally does significant amount of writing. In > most cases flusher thread / journal thread do it so this effect is not > visible. And to precede your question, it would be rather hard to avoid IO > while the transaction is started due to locking. Probably we should mark all IOs issued inside transaction as META (or whatever which tells blkcg to avoid throttling it). We're gonna need overcharging for metadata writes anyway, so I don't think this will make too much of a difference. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>