On Thu, Sep 17, 2015 at 10:40 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > Ok, makes sense - the plug is not being flushed as we switch away, > but Chris' patch makes it do that. Yup. And I actually think Chris' patch is better than the one I sent out (but maybe the scheduler people should take a look at the behavior of cond_resched()), I just wanted you to test that to verify the behavior. The fact that Chris' patch ends up lowering the context switches (because it does the unplugging directly) is also an argument for his approach. I just wanted to understand the oddity with kblockd_workqueue. And I think that's solved. > Context switches go back to the 4-4500/sec range. Otherwise > behaviour and performance is indistinguishable from Chris' patch. .. this was exactly what I wanted to hear. So it sounds like we have no odd unexplained behavior left in this area. Which is not to say that there wouldn't be room for improvement, but it just makes me much happier about the state of these patches to feel like we understand what was going on. > PS: just hit another "did this just get broken in 4.3-rc1" issue - I > can't run blktrace while there's a IO load because: > > $ sudo blktrace -d /dev/vdc > BLKTRACESETUP(2) /dev/vdc failed: 5/Input/output error > Thread 1 failed open /sys/kernel/debug/block/(null)/trace1: 2/No such file or directory > .... > > [ 641.424618] blktrace: page allocation failure: order:5, mode:0x2040d0 > [ 641.438933] [<ffffffff811c1569>] kmem_cache_alloc_trace+0x129/0x400 > [ 641.440240] [<ffffffff811424f8>] relay_open+0x68/0x2c0 > [ 641.441299] [<ffffffff8115deb1>] do_blk_trace_setup+0x191/0x2d0 > > gdb) l *(relay_open+0x68) > 0xffffffff811424f8 is in relay_open (kernel/relay.c:582). > 577 return NULL; > 578 if (subbuf_size > UINT_MAX / n_subbufs) > 579 return NULL; > 580 > 581 chan = kzalloc(sizeof(struct rchan), GFP_KERNEL); > 582 if (!chan) > 583 return NULL; > 584 > 585 chan->version = RELAYFS_CHANNEL_VERSION; > 586 chan->n_subbufs = n_subbufs; > > and struct rchan has a member struct rchan_buf *buf[NR_CPUS]; > and CONFIG_NR_CPUS=8192, hence the attempt at an order 5 allocation > that fails here.... Hm. Have you always had MAX_SMP (and the NR_CPU==8192 that it causes)? >From a quick check, none of this code seems to be new. That said, having that struct rchan_buf *buf[NR_CPUS]; in "struct rchan" really is something we should fix. We really should strive to not allocate things by CONFIG_NR_CPU's, but by the actual real CPU count. This looks to be mostly Jens' code, and much of it harkens back to 2006. Jens? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html