Dear John, Thanks for your reply. Fei Xia > 在 2015年11月19日,18:07,John Spray <jspray@xxxxxxxxxx> 写道: > > On Thu, Nov 19, 2015 at 9:43 AM, xiafei <xiafei2011@xxxxxxxxx> wrote: >> Hi, all: >> I have two questions about MDLog: >> >> 1. The max number of logsegments per MDlog (mds_log_max_segments) is configured to be 30 in the config_opts.h file. >> However, the MDLog doesn’t check the number of logsegments when it start a new segment. >> The configuration is only used when the number of segments in a MDLog is larger than 2*mds_log_max_segments. >> The MDS notifies monitor, while the monitor does nothing. >> My question is: Is the logsegments size limited to a max size? If so, what’s the size? > > mds_log_max_segments is used in MDLog::trim (where it is aliased to > the local max_segments variable). The MDS will trim some segments if > there are currently more than mds_log_max_segments: this is the > typical way to limit how long the journal is. It's not enforced > rigidly: if you set max segments to 2, and do lots of metadata IO, > you'll see it bounce between 2 and 3 most of the time. > > You have already noticed that this setting is also used in Beacon.cc > to generate a health warning if the journal has grown to 2x the size > limit: this is to alert the user if the MDS is failing to trim its > journal (can be caused by a certain class of bugs or potentially just > by a pathologically slow OSD cluster) > >> 2. The MDLog prezeros two periods ahead of the write_pos of Journaler. >> The comment of _issue_prezero function is “we need to zero at least two periods, minimum, to ensure that we have a full empty object/period in front of us”. >> Does it means that the OSD will preallocate objects for the Journaler ? >> The function is actually implemented by Objecter::remove. However, the Objecter::remove only removes a object through FileStore/NewStore. >> It seams that the OSD doesn’t preallocate objects. If so, then what’s the purpose of prezero? Or, do I misunderstand anything? > > Journaler uses the Filer abstraction, and when going through Filer > there is no distinction between zeros in an object and the object > being missing. Either way when you read that range you get zeros. > > Prezeroing is a bit subtle. It is is necessary because the journal > writes don't necessarily persist in a monotonic forward order. In a > crash, we might sometimes leave a gap at the front of the journal, > then some data. We'll reprobe (Filer::probe) to the start of the gap, > leaving data after the gap as junk (this is OK because journal data > isn't considered safe until everything up to its position is safe > (i.e. Journaller::safe_pos advances)). After that recovery, we need > to do prezeroing because otherwise, if we crashed again, on the > subsequent recovery we might confuse the junk with valid data. > > John _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com