On Wed, 31 Aug 2005, Jens Axboe wrote:
Nothing sticks out here either. There's plenty of idle time. It smells like a driver issue. Can you try the same dd test, but read from the drives instead? Use a bigger blocksize here, 128 or 256k.
I used the following command reading from all 8 disks in parallel: dd if=/dev/sd?1 of=/dev/null bs=256k count=78125 Here vmstat output (I just cut something out in the middle): procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----^M r b swpd free buff cache si so bi bo in cs us sy id wa^M 3 7 4348 42640 7799984 9612 0 0 322816 0 3532 4987 0 22 0 78 1 7 4348 42136 7800624 9584 0 0 322176 0 3526 4987 0 23 4 74 0 8 4348 39912 7802648 9668 0 0 322176 0 3525 4955 0 22 12 66 1 7 4348 38912 7803700 9636 0 0 322432 0 3526 5078 0 23 7 70 2 6 4348 37552 7805120 9644 0 0 322432 0 3527 4908 0 23 12 64 0 8 4348 41152 7801552 9608 0 0 322176 0 3524 5018 0 24 6 70 1 7 4348 41644 7801044 9572 0 0 322560 0 3530 5175 0 23 0 76 1 7 4348 37184 7805396 9640 0 0 322176 0 3525 4914 0 24 18 59 3 7 4348 41704 7800376 9832 0 0 322176 20 3531 5080 0 23 4 73 1 7 4348 40652 7801700 9732 0 0 323072 0 3533 5115 0 24 13 64 1 7 4348 40284 7802224 9616 0 0 322560 0 3527 4967 0 23 1 76 0 8 4348 40156 7802356 9688 0 0 322560 0 3528 5080 0 23 2 75 6 8 4348 41896 7799984 9816 0 0 322176 0 3530 4945 0 24 20 57 0 8 4348 39540 7803124 9600 0 0 322560 0 3529 4811 0 24 21 55 1 7 4348 41520 7801084 9600 0 0 322560 0 3532 4843 0 23 22 55 0 8 4348 40408 7802116 9588 0 0 322560 0 3527 5010 0 23 4 72 0 8 4348 38172 7804300 9580 0 0 322176 0 3526 4992 0 24 7 69 4 7 4348 42264 7799784 9812 0 0 322688 0 3529 5003 0 24 8 68 1 7 4348 39908 7802520 9660 0 0 322700 0 3529 4963 0 24 14 62 0 8 4348 37428 7805076 9620 0 0 322420 0 3528 4967 0 23 15 62 0 8 4348 37056 7805348 9688 0 0 322048 0 3525 4982 0 24 26 50 1 7 4348 37804 7804456 9696 0 0 322560 0 3528 5072 0 24 16 60 0 8 4348 38416 7804084 9660 0 0 323200 0 3533 5081 0 24 23 53 0 8 4348 40160 7802300 9676 0 0 323200 28 3543 5095 0 24 17 59 1 7 4348 37928 7804612 9608 0 0 323072 0 3532 5175 0 24 7 68 2 6 4348 38680 7803724 9612 0 0 322944 0 3531 4906 0 25 24 51 1 7 4348 40408 7802192 9648 0 0 322048 0 3524 4947 0 24 19 57 Full vmstat session can be found under: ftp://ftp.dwd.de/pub/afd/linux_kernel_debug/vmstat-256k-read And here the profile data: 2106577 total 0.9469 1638177 default_idle 34128.6875 179615 copy_user_generic_c 4726.7105 27670 end_buffer_async_read 108.0859 26055 shrink_zone 7.1111 23199 __make_request 17.2612 17221 kmem_cache_free 153.7589 11796 drop_buffers 52.6607 11016 add_to_page_cache 52.9615 9470 __wake_up_bit 197.2917 8760 buffered_rmqueue 12.4432 8646 find_get_page 90.0625 8319 __do_page_cache_readahead 11.0625 7976 kmem_cache_alloc 124.6250 7463 scsi_request_fn 6.2192 7208 try_to_free_buffers 40.9545 6716 create_empty_buffers 41.9750 6432 __end_that_request_first 11.8235 6044 test_clear_page_dirty 25.1833 5643 scsi_dispatch_cmd 9.7969 5588 free_hot_cold_page 19.4028 5479 submit_bh 18.0230 3903 __alloc_pages 3.2965 3671 file_read_actor 9.9755 3425 thread_return 14.2708 3333 generic_make_request 5.6301 3294 bio_alloc_bioset 7.6250 2868 bio_put 44.8125 2851 mpt_interrupt 2.8284 2697 mempool_alloc 8.8717 2642 block_read_full_page 3.9315 2512 do_generic_mapping_read 2.1216 2394 set_page_refs 149.6250 2235 alloc_page_buffers 9.9777 1992 __pagevec_lru_add 8.3000 1859 __memset 9.6823 1791 page_waitqueue 15.9911 1783 scsi_end_request 6.9648 1348 dma_unmap_sg 6.4808 1324 bio_endio 11.8214 1306 unlock_page 20.4062 1211 mptscsih_freeChainBuffers 7.5687 1141 alloc_pages_current 7.9236 1136 __mod_page_state 35.5000 1116 radix_tree_preload 8.7188 1061 __pagevec_release_nonlru 6.6312 1043 set_bh_page 9.3125 1024 release_pages 2.9091 1023 mempool_free 6.3937 832 alloc_buffer_head 13.0000 Full profile data can be found under: ftp://ftp.dwd.de/pub/afd/linux_kernel_debug/dd-256k-8disk-read.profile
You might want to try the same with direct io, just to eliminate the costly user copy. I don't expect it to make much of a difference though, feels like the problem is elsewhere (driver, most likely).
Sorry, I don't know how to do this. Do you mean using a C program that sets some flag to do direct io, or how can I do that?
If we still can't get closer to this, it would be interesting to try my block tracing stuff so we can see what is going on at the queue level. But lets gather some more info first, since it requires testing -mm.
Ok, please then just tell me what I must do. Thanks, Holger - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html