Re: Few questions about (attempting to use) write journal + call traces

Song Liu <liu.song.a23@xxxxxxxxx> · Tue, 28 May 2019 09:31:45 -0700

On Mon, May 27, 2019 at 2:46 AM Michal Soltys <soltys@xxxxxxxx> wrote:
>
> On 5/24/19 7:51 PM, Song Liu wrote:
> > On Fri, May 24, 2019 at 3:51 AM Michal Soltys <soltys@xxxxxxxx> wrote:
> >>
> >> On 5/23/19 8:09 PM, Song Liu wrote:
> >>>>>
> >>>>
> >>>> Actually, this seems to be unreleated to underlying devices - the culprit seems to be attempting to write to an array after adding journal, without stopping and reassembling it first. Details below.
> >>>
> >>> Thanks for these experiments. Your analysis makes perfect sense.
> >>>
> >>> Do you think you can continue the  experiments with the write journal before
> >>> this issue got fixed?
> >>>
> >>> I am asking because this is not on the top of my list at this time. If
> >>> this is not
> >>> blocking other important tests, I would prefer to fix it at a later time.
> >>>
> >>> Thanks,
> >>> Song
> >>>
> >>
> >> Yea it's fine. I can help with testing (whenever you sit down to this
> >> issues) as well.
> >>
> >> Question though - other than trying to add journal to existing live raid
> >> - is this feature overall safe to use (or are there any other know
> >> issues one should be aware of beforehand) ?
> >>
> > We (Facebook) have done some tests with it. However, we didn't put
> > it into production. The reason behind this decision was not reliability, but
> > performance concerns and high level directions. I think Redhat is
> > evaluating it.
> >
>
> Well I will give it a shot probably. My case scenario is that a bunch of
> sync-happy VMs on top of lvm+raid seem to be crushing performance
> (unless there are other reasons), even with very small disk usage.
>
> Out of curiosity - is the journal in writeback mode controllable in some
> way (e.g. frequency of how often it flushes to raid disks, whether it's
> space or time (or both) based ?).

It is combination of both time and space:

/*
 * log->max_free_space is min(1/4 disk size, 10G reclaimable space).
 *
 * In write through mode, the reclaim runs every log->max_free_space.
 * This can prevent the recovery scans for too long
 */
#define RECLAIM_MAX_FREE_SPACE (10 * 1024 * 1024 * 2) /* sector */
#define RECLAIM_MAX_FREE_SPACE_SHIFT (2)

/* wake up reclaim thread periodically */
#define R5C_RECLAIM_WAKEUP_INTERVAL (30 * HZ)
/* start flush with these full stripes */
#define R5C_FULL_STRIPE_FLUSH_BATCH(conf) (conf->max_nr_stripes / 4)
/* reclaim stripes in groups */
#define R5C_RECLAIM_STRIPE_GROUP (NR_STRIPE_HASH_LOCKS * 2)

However, we didn't expose knobs to tune these on a live system.

Thanks,
Song
>
>
>
>
> > + Xiao, who might be working on this.
> >
> > Thanks,
> > Song
> >
>