On Wed, Feb 25, 2015 at 11:31:17PM +0900, Tetsuo Handa wrote: > Dave Chinner wrote: > > This exact discussion is already underway. > > > > My initial proposal: > > > > http://oss.sgi.com/archives/xfs/2015-02/msg00314.html > > > > Why mempools don't work but transaction based reservations will: > > > > http://oss.sgi.com/archives/xfs/2015-02/msg00339.html > > > > Reservation needs to be an accounting mechanisms, not preallocation: > > > > http://oss.sgi.com/archives/xfs/2015-02/msg00456.html > > http://oss.sgi.com/archives/xfs/2015-02/msg00457.html > > http://oss.sgi.com/archives/xfs/2015-02/msg00458.html > > > > And that's where the discussion currently sits. > > I got two problems (one is stall at io_schedule() This is a typical "blame the messenger" bug report. XFS is stuck in inode reclaim waiting for log IO completion to occur, along with all the other processes iin xfs_log_force also stuck waiting for the same Io completion. You need to find where that IO completion that everything is waiting on has got stuck or show that it's not a lost IO and actually an XFS problem. e.g has the IO stack got stuck on a mempool somewhere? > , the other is kernel panic > due to xfs's assertion failure) using Linux 3.19. > http://I-love.SAKURA.ne.jp/tmp/crash-20150225-2.log.xz ) > ---------- > [ 189.586204] Out of memory: Kill process 3701 (a.out) score 834 or sacrifice child > [ 189.586205] Killed process 3701 (a.out) total-vm:2167392kB, anon-rss:1465820kB, file-rss:4kB > [ 189.586210] Kill process 3702 (a.out) sharing same memory > [ 189.586211] Kill process 3714 (a.out) sharing same memory > [ 189.586212] Kill process 3748 (a.out) sharing same memory > [ 189.586213] Kill process 3755 (a.out) sharing same memory > [ 189.593470] XFS: Assertion failed: XFS_FORCED_SHUTDOWN(mp), file: fs/xfs/xfs_inode.c, line: 1701 Which is a failure of xfs_trans_reserve(), and through the calling context and parameters can only be from xfs_log_reserve(). That's got a pretty clear cause: tic = xlog_ticket_alloc(log, unit_bytes, cnt, client, permanent, KM_SLEEP | KM_MAYFAIL); if (!tic) return -ENOMEM; And the reason for the ASSERT is pretty clear: we put it there because we need to know - as developers - what failures (if any) ever come through that path. This is called from evict(): > [ 189.593565] Call Trace: > [ 189.593568] [<ffffffff812ab2d7>] xfs_inactive_truncate+0x67/0x150 > [ 189.593569] [<ffffffff812acb98>] xfs_inactive+0x1c8/0x1f0 > [ 189.593570] [<ffffffff812b3216>] xfs_fs_evict_inode+0x86/0xd0 > [ 189.593572] [<ffffffff811da0f8>] evict+0xb8/0x190 > [ 189.593574] [<ffffffff811daa15>] iput+0xf5/0x180 And as such there is no mechanism for actually reporting the error to userspace and in failing here we are about to leak an inode. When an XFS developer is testing new code, having a failure like that get trapped is immensely useful. However, on production systems, we can just keep going because it's not a fatal error and, even more importantly, the leaked inode will get cleaned up by log recovery next time the filesystem is mounted. IOWs, when you run CONFIG_XFS_DEBUG=y, you'll often get failures that are valuable to XFS developers but have no runtime effect on production systems. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>