Il giorno mer, 27/11/2019 alle 21.08 +0800, Ming Lei ha scritto: > On Wed, Nov 27, 2019 at 10:39:40AM +0100, Andrea Vai wrote: > > Il giorno mer, 27/11/2019 alle 10.05 +0800, Ming Lei ha scritto: > > > > > > > > > It can be workaround via the following change: > > > > > > /lib/modules/5.4.0+/build/include/generated/autoconf.h: > > > > > > //#define CONFIG_CC_HAS_ASM_INLINE 1 > > > > Thanks, it worked, trace attached. Produced by: start the trace > script > > (with the pendrive already plugged), wait some seconds, run the > test > > (1 trial, 1 GB), wait for the test to finish, stop the trace. > > > > The copy took 2659 seconds, roughly as already seen before. > > Thanks for collecting the log. > > From the log, some of write IOs are out-of-order, such as, the 1st > one > is 378880. > > 16.41240 2 266 266 kworker/2:1H block_rq_issue b'W' > 370656 240 > 16.41961 3 485 485 kworker/3:1H block_rq_issue b'W' > 378880 240 > 16.73729 2 266 266 kworker/2:1H block_rq_issue b'W' > 370896 240 > 17.71161 2 266 266 kworker/2:1H block_rq_issue b'W' > 379120 240 > 18.02344 2 266 266 kworker/2:1H block_rq_issue b'W' > 371136 240 > 18.94314 3 485 485 kworker/3:1H block_rq_issue b'W' > 379360 240 > 19.25624 2 266 266 kworker/2:1H block_rq_issue b'W' > 371376 240 > > IO latency is increased a lot since the 1st out-of-order request(usb > storage HBA is single queue depth, one request can be issued only > if > the previous issued request is completed). > > The reason is that there are two kind of tasks which inserts rq to > device. > One is the 'cp' process, the other is kworker/u8:*. The out-of- > order > happens during the two task's interleaving. > > Under such situation, I believe that the old legacy IO path may not > guarantee the order too. In blk_queue_bio(), after get_request() > allocates one request, the queue lock is released. And request is > actually inserted & issued from blk_flush_plug_list() under the > branch of 'if (plug)'. If requests are from two tasks, then request > is inserted/issued from two plug list, and no order can be > guaranteed. > > In my test, except for several requests from the beginning, all > other > requests are inserted via the kworker thread(guess it is writeback > wq), > that is why I can't observe the issue in my test. > > As Schmid suggested, you may run the same test on old kernel with > legacy io path, and see if the performance is still good. > > Also, could you share the following info about your machine? So that > I can build my VM guest in this setting for reproducing your > situation > (requests are inserted from two types of threads). > > - lscpu attached, > - free -h total used free shared buff/cache available Mem: 23Gi 4,2Gi 11Gi 448Mi 7,0Gi 18Gi Swap: 3,7Gi 0B 3,7Gi > - lsblk -d $USB_DISK NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sdg 8:96 1 28,8G 0 disk > - exact commands for mount the disk, and running the copy operation I attached the whole script to this thread, I attach it again to this message and copy the relevant lines here: mount UUID=$uuid /mnt/pendrive 2>&1 |tee -a $logfile SECONDS=0 cp $testfile /mnt/pendrive 2>&1 |tee -a $logfile umount /mnt/pendrive 2>&1 |tee -a $logfile Meanwhile, I am going on with the further tests as suggested Thanks, Andrea
Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 60 Model name: Intel(R) Core(TM) i5-4430 CPU @ 3.00GHz Stepping: 3 CPU MHz: 1674.727 CPU max MHz: 3200,0000 CPU min MHz: 800,0000 BogoMIPS: 5986.16 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K NUMA node0 CPU(s): 0-3 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
Attachment:
test
Description: application/shellscript