>I build a striped LV on 4 scsi disks. If I do sequential IO on one disk >with buffer size 4k, the bandwidth is 9MB/s. Then I do sequential IO on >LV with buffer size 16k. Theoretically I can get almost 4*9=36MB/s >because LVM stripe 4k IO on every disk but I only get 18MB/s. I don't >know where I lost so much performance or it is the overhead of LVM? Later I repeated all those tests and used different benchmarking tools. But this time I care more about the distribution of the execution time. Below is my test results generated by lmdd from LMBETCH tool and I use "time" command to get the real time and system time and raw command to implement Raw IO. ( Here LVMn means Logical Volume built on n disks with striping size 4K) Objects Chunk_Size Total_IO_Size Bandwidth Real_time System_time ------- ---------- ------------- --------- --------- ----------- Single disk 4KB 1GB 8.7MB/s 117.6s 37.1s LVM2 8KB 2GB 12.7MB/s 161.2s 78.1s LVM3 12KB 3GB 15.5MB/s 197.8s 115.6s LVM4 16KB 4GB 17.4MB/s 235.1s 154.4s I find only system time increases linearly and it looks like the reason that causes the lost bandwidth of LVM3 or LVM4. Here I have an assumption: IO_time = real_time - system_time. If this assumption is correct, the IO times used in those four tests are almost same and it makes sense because all tests read 1GB data from a single disk. So theoretically we believe the time used to read nGB data from LVMn should be same as the time used to read 1GB from a single disk because LVM can stripe the IO to separate disks. Actually I think the time used to read nGB from LVMn should be n * system_time_of_single_disk + IO_time_of_single_disk Even the calculated results can match my test results, I still feel confused. 1) Even CPU can only serve one disk's requests at any time, there should be some overlap of system time and IO time but in my results, I can't find the overlap. 2) Why does system time increase linearly? For LVM4, the cpu usage becomes 65%. Isn't it too high? I think all the tests should be IO-bounded. But tests for LVM3 and LVM4 are cpu-bounded. >My question is why I can't see nearly linear scaling of the bandwidth >when the buffer size is small? Does striping LVM do real parallel IO >similar to software RAID0? I can answer this question now because I tried raid0 and got almost same performance as LVM. Now my questions become: 1) Is my assumption of IO time correct? 2) Is my explanation reasonable? 3) Why LVM3 or LVM4 used so much system time and have such high cpu usage? Thanks! --xiaoxiang