On Thu, Mar 31, 2016 at 4:20 AM, WuJaven <javen.wu@xxxxxxxxxx> wrote: > Hi Team, > > I have a question about MDS journal trim policy. > I found the MDS journal would be flushed when the segment number is over the predefined value (default 30) only in tick thread context. > If the journal is not over the predefined value, the segments would not be flushed to the OMAP even eventhrough the system is not busy. > My question is what’s the intention for the design? > I can understand that journal replay is able to restore metadata into memory, there is no negative impact even though the journal is committed to the backend OMAP. > Is there any more concern? > What if we flush the journal segments in the every tick period as along as journal is not empty? In general, we don't want to flush sooner because it would generate a higher number of metadata IOs overall. Because many workloads repeatedly modify the same pieces of metadata, it's beneficial to coalesce these and only do the final IO when the dirty inode/dentry/dirfrag "falls off" the end of the journal. Note that the dirty_inodes etc members on LogSegments are `elists`, so that when we dirty an inode on a newer logsegment, it gets implicitly disassociated from the old logsegement. However, I have also thought that it would be useful to flush the journal on idle systems. Especially, since our new fsck/recovery functionality benefits from having a fully written-back metadata store, it would be nice to do this opportunistically if we could invent a suitable heuristic for detecting idleness. John > > Thanks > Javen-- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html