> -----Original Message----- > From: Jeff Moyer [mailto:jmoyer@xxxxxxxxxx] > Sent: Thursday, 10 July, 2014 2:14 PM > To: Elliott, Robert (Server Storage) > Cc: Christoph Hellwig; Jens Axboe; dgilbert@xxxxxxxxxxxx; James Bottomley; > Bart Van Assche; Benjamin LaHaise; linux-scsi@xxxxxxxxxxxxxxx; linux- > kernel@xxxxxxxxxxxxxxx > Subject: Re: scsi-mq V2 > > "Elliott, Robert (Server Storage)" <Elliott@xxxxxx> writes: > > >> -----Original Message----- > >> From: Christoph Hellwig [mailto:hch@xxxxxxxxxxxxx] > >> Sent: Thursday, 10 July, 2014 11:15 AM > >> To: Elliott, Robert (Server Storage) > >> Cc: Jens Axboe; dgilbert@xxxxxxxxxxxx; James Bottomley; Bart Van Assche; > >> Benjamin LaHaise; linux-scsi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx > >> Subject: Re: scsi-mq V2 > >> > >> On Thu, Jul 10, 2014 at 09:04:22AM -0700, Christoph Hellwig wrote: > >> > It's starting to look weird. I'll prepare another two bisect branches > >> > around some MM changes, which seems the only other possible candidate. > >> > >> I've pushed out scsi-mq.3-bisect-3 > > > > Good. > > > >> and scsi-mq.3-bisect-4 for you. > > > > Bad. > > > > Note: I had to apply the vdso2c.h patch to build this -rc3 based kernel: > > diff --git a/arch/x86/vdso/vdso2c.h b/arch/x86/vdso/vdso2c.h > > index df95a2f..11b65d4 100644 > > --- a/arch/x86/vdso/vdso2c.h > > +++ b/arch/x86/vdso/vdso2c.h > > @@ -93,6 +93,9 @@ static void BITSFUNC(copy_section)(struct > BITSFUNC(fake_sections) *out, > > uint64_t flags = GET_LE(&in->sh_flags); > > > > bool copy = flags & SHF_ALLOC && > > + (GET_LE(&in->sh_size) || > > + (GET_LE(&in->sh_type) != SHT_RELA && > > + GET_LE(&in->sh_type) != SHT_REL)) && > > strcmp(name, ".altinstructions") && > > strcmp(name, ".altinstr_replacement"); > > > > Results: fio started OK, getting 900K IOPS, but ^C led to 0 IOPS and > > an fio hang, with one CPU (CPU 0) stuck in io_submit loops. > I added some prints in aio_setup_ring and ioctx_alloc and rebooted. This time it took much longer to hit the problem. It survived dozens of ^Cs. Running a few minutes, though, IOPS eventually dropped. So, sometimes it happens immediately, sometimes it takes time to develop. I will rerun bisect-1 -2 and -3 for longer times to increase confidence that they didn't just appear good. On this bisect-4 run, as IOPS started to drop from 900K to 40K, I ran perf top when it was at 700K. You can see io_submit times creeping up. 4.30% [kernel] [k] do_io_submit 4.29% [kernel] [k] _raw_spin_lock_irqsave 3.88% libaio.so.1.0.1 [.] io_submit 3.55% [kernel] [k] system_call 3.34% [kernel] [k] put_compound_page 3.11% [kernel] [k] io_submit_one 3.06% [kernel] [k] system_call_after_swapgs 2.89% [kernel] [k] copy_user_generic_string 2.45% [kernel] [k] lookup_ioctx 2.16% [kernel] [k] apic_timer_interrupt 2.00% [kernel] [k] _raw_spin_lock 1.97% [scsi_debug] [k] sdebug_q_cmd_hrt_complete 1.84% [kernel] [k] __get_page_tail 1.74% [kernel] [k] do_blockdev_direct_IO 1.68% [kernel] [k] blk_flush_plug_list 1.41% [kernel] [k] _raw_spin_unlock_irqrestore 1.24% [scsi_debug] [k] schedule_resp finally settling like before: 14.15% [kernel] [k] do_io_submit 13.61% libaio.so.1.0.1 [.] io_submit 11.81% [kernel] [k] system_call 10.11% [kernel] [k] system_call_after_swapgs 8.59% [kernel] [k] io_submit_one 8.56% [kernel] [k] copy_user_generic_string 7.96% [kernel] [k] lookup_ioctx 5.33% [kernel] [k] blk_flush_plug_list 3.11% [kernel] [k] blk_finish_plug 2.84% [kernel] [k] sysret_check 2.63% fio [.] fio_libaio_commit 2.27% [kernel] [k] blk_start_plug 1.17% [kernel] [k] SyS_io_submit -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html