Re: Questions about MDLog size and prezero operation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 19, 2015 at 9:43 AM, xiafei <xiafei2011@xxxxxxxxx> wrote:
> Hi, all:
>         I have two questions about MDLog:
>
> 1. The max number of logsegments per MDlog (mds_log_max_segments) is configured to be 30 in the config_opts.h file.
> However, the MDLog doesn’t check the number of logsegments when it start a new segment.
> The configuration is only used when the number of segments in a MDLog is larger than 2*mds_log_max_segments.
> The MDS notifies monitor, while the monitor does nothing.
> My question is: Is the logsegments size limited to a max size? If so, what’s the size?

mds_log_max_segments is used in MDLog::trim (where it is aliased to
the local max_segments variable).  The MDS will trim some segments if
there are currently more than mds_log_max_segments: this is the
typical way to limit how long the journal is.  It's not enforced
rigidly: if you set max segments to 2, and do lots of metadata IO,
you'll see it bounce between 2 and 3 most of the time.

You have already noticed that this setting is also used in Beacon.cc
to generate a health warning if the journal has grown to 2x the size
limit: this is to alert the user if the MDS is failing to trim its
journal (can be caused by a certain class of bugs or potentially just
by a pathologically slow OSD cluster)

> 2. The MDLog prezeros two periods ahead of the write_pos of Journaler.
> The comment of _issue_prezero function is “we need to zero at least two periods, minimum, to ensure that we have a full empty object/period in front of us”.
> Does it means that the OSD will preallocate objects for the Journaler ?
> The function is actually implemented  by Objecter::remove. However, the Objecter::remove only removes a object through FileStore/NewStore.
> It seams that the OSD doesn’t preallocate objects. If so, then what’s the purpose of prezero? Or, do I misunderstand anything?

Journaler uses the Filer abstraction, and when going through Filer
there is no distinction between zeros in an object and the object
being missing.  Either way when you read that range you get zeros.

Prezeroing is a bit subtle.  It is is necessary because the journal
writes don't necessarily persist in a monotonic forward order.  In a
crash, we might sometimes leave a gap at the front of the journal,
then some data.  We'll reprobe (Filer::probe) to the start of the gap,
leaving data after the gap as junk (this is OK because journal data
isn't considered safe until everything up to its position is safe
(i.e. Journaller::safe_pos advances)).  After that recovery, we need
to do prezeroing because otherwise, if we crashed again, on the
subsequent recovery we might confuse the junk with valid data.

John
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux