Re: [PATCH] fs-writeback: drop wb->list_lock during blk_finish_plug()

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Thu, 17 Sep 2015 23:04:03 -0700

On Thu, Sep 17, 2015 at 10:40 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> Ok, makes sense - the plug is not being flushed as we switch away,
> but Chris' patch makes it do that.

Yup.

And I actually think Chris' patch is better than the one I sent out
(but maybe the scheduler people should take a look at the behavior of
cond_resched()), I just wanted you to test that to verify the
behavior.

The fact that Chris' patch ends up lowering the context switches
(because it does the unplugging directly) is also an argument for his
approach.

I just wanted to understand the oddity with kblockd_workqueue. And I
think that's solved.

> Context switches go back to the 4-4500/sec range. Otherwise
> behaviour and performance is indistinguishable from Chris' patch.

.. this was exactly what I wanted to hear. So it sounds like we have
no odd unexplained behavior left in this area.

Which is not to say that there wouldn't be room for improvement, but
it just makes me much happier about the state of these patches to feel
like we understand what was going on.

> PS: just hit another "did this just get broken in 4.3-rc1" issue - I
> can't run blktrace while there's a IO load because:
>
> $ sudo blktrace -d /dev/vdc
> BLKTRACESETUP(2) /dev/vdc failed: 5/Input/output error
> Thread 1 failed open /sys/kernel/debug/block/(null)/trace1: 2/No such file or directory
> ....
>
> [  641.424618] blktrace: page allocation failure: order:5, mode:0x2040d0
> [  641.438933]  [<ffffffff811c1569>] kmem_cache_alloc_trace+0x129/0x400
> [  641.440240]  [<ffffffff811424f8>] relay_open+0x68/0x2c0
> [  641.441299]  [<ffffffff8115deb1>] do_blk_trace_setup+0x191/0x2d0
>
> gdb) l *(relay_open+0x68)
> 0xffffffff811424f8 is in relay_open (kernel/relay.c:582).
> 577                     return NULL;
> 578             if (subbuf_size > UINT_MAX / n_subbufs)
> 579                     return NULL;
> 580
> 581             chan = kzalloc(sizeof(struct rchan), GFP_KERNEL);
> 582             if (!chan)
> 583                     return NULL;
> 584
> 585             chan->version = RELAYFS_CHANNEL_VERSION;
> 586             chan->n_subbufs = n_subbufs;
>
> and struct rchan has a member struct rchan_buf *buf[NR_CPUS];
> and CONFIG_NR_CPUS=8192, hence the attempt at an order 5 allocation
> that fails here....

Hm. Have you always had MAX_SMP (and the NR_CPU==8192 that it causes)?
>From a quick check, none of this code seems to be new.

That said, having that

        struct rchan_buf *buf[NR_CPUS];

in "struct rchan" really is something we should fix. We really should
strive to not allocate things by CONFIG_NR_CPU's, but by the actual
real CPU count.

This looks to be mostly Jens' code, and much of it harkens back to 2006. Jens?

                    Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html