Re: Questions about MDLog size and prezero operation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear John,
	Thanks for your reply.

Fei Xia



> 在 2015年11月19日,18:07,John Spray <jspray@xxxxxxxxxx> 写道:
> 
> On Thu, Nov 19, 2015 at 9:43 AM, xiafei <xiafei2011@xxxxxxxxx> wrote:
>> Hi, all:
>>        I have two questions about MDLog:
>> 
>> 1. The max number of logsegments per MDlog (mds_log_max_segments) is configured to be 30 in the config_opts.h file.
>> However, the MDLog doesn’t check the number of logsegments when it start a new segment.
>> The configuration is only used when the number of segments in a MDLog is larger than 2*mds_log_max_segments.
>> The MDS notifies monitor, while the monitor does nothing.
>> My question is: Is the logsegments size limited to a max size? If so, what’s the size?
> 
> mds_log_max_segments is used in MDLog::trim (where it is aliased to
> the local max_segments variable).  The MDS will trim some segments if
> there are currently more than mds_log_max_segments: this is the
> typical way to limit how long the journal is.  It's not enforced
> rigidly: if you set max segments to 2, and do lots of metadata IO,
> you'll see it bounce between 2 and 3 most of the time.
> 
> You have already noticed that this setting is also used in Beacon.cc
> to generate a health warning if the journal has grown to 2x the size
> limit: this is to alert the user if the MDS is failing to trim its
> journal (can be caused by a certain class of bugs or potentially just
> by a pathologically slow OSD cluster)
> 
>> 2. The MDLog prezeros two periods ahead of the write_pos of Journaler.
>> The comment of _issue_prezero function is “we need to zero at least two periods, minimum, to ensure that we have a full empty object/period in front of us”.
>> Does it means that the OSD will preallocate objects for the Journaler ?
>> The function is actually implemented  by Objecter::remove. However, the Objecter::remove only removes a object through FileStore/NewStore.
>> It seams that the OSD doesn’t preallocate objects. If so, then what’s the purpose of prezero? Or, do I misunderstand anything?
> 
> Journaler uses the Filer abstraction, and when going through Filer
> there is no distinction between zeros in an object and the object
> being missing.  Either way when you read that range you get zeros.
> 
> Prezeroing is a bit subtle.  It is is necessary because the journal
> writes don't necessarily persist in a monotonic forward order.  In a
> crash, we might sometimes leave a gap at the front of the journal,
> then some data.  We'll reprobe (Filer::probe) to the start of the gap,
> leaving data after the gap as junk (this is OK because journal data
> isn't considered safe until everything up to its position is safe
> (i.e. Journaller::safe_pos advances)).  After that recovery, we need
> to do prezeroing because otherwise, if we crashed again, on the
> subsequent recovery we might confuse the junk with valid data.
> 
> John


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux