Re: write is faster whan seek?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jens Axboe <jens.axboe@xxxxxxxxxx> writes:

> On Wed, Jun 11 2008, Alan D. Brunelle wrote:
>> Jens Axboe wrote:
>> > On Wed, Jun 11 2008, Alan D. Brunelle wrote:
>> >> Dmitri Monakhov wrote:
>> >>
>> >> Could it be that in the first case you will have merges, thus creating
>> >> fewer/larger I/O requests? Running iostat -x during the two runs, and
>> >> watching the output is a good first place to start.
>> > 
>> > I think it's mostly down to whether a specific drive is good at doing
>> > 124kb writes + 4k seek (and repeat) compared to regular streaming
>> > writes. The tested disk was SATA with write back caching, there should
>> > be no real command overhead gain in those size ranges.
>> > 
>> 
>> Probably true, I'd think the iostat -x data would be very helpful though.
>
> Definitely, the more data the better :). I already asked for blktrace
> data, that should give us everything we need.
Seems it is hardware issue.
Test:
I've disabled margeing request logic in __make_request(), and restrict 
bio->bi_size =< 128 sectors via merge_bvec_fn. ioscheduler:noop
For IO patterns
1 CW(continuous writes)  write(,, PAGE_SIZE*16)
2 WH(writes with holes)   write(,,PAGE_SIZE*15); lseek(,PAGE_SIZE, SEEK_CUR)
3 CR(continuous reads)    write(,, PAGE_SIZE*16)
4 RH(read with holes)    same as WH but directly send bios in order to explicitly
  prevent read-ahead logic.

I've played with sata disk with NCQ on AHCI, and SCSI disk. 

Result: For SATA disk
Performance drawback caused by restricted bio size was negligible for all io
patterns. So this is definitely not queue starvation issue. BIOs sended by
pdflush was ordered in all cases(as expected). For all io patterns, except WH
case, driver completions was olso ordered. But for on WH io pattern driver 
seems goes crazy:
 Dispetched requests:
  8,0    1       14     0.000050684  3485  D   W 0 + 128 [pdflush]
  8,0    1       15     0.000055906  3485  D   W 136 + 128 [pdflush]
  8,0    1       16     0.000059269  3485  D   W 272 + 128 [pdflush]
  8,0    1       17     0.000062625  3485  D   W 408 + 128 [pdflush]
  8,0    1       31     0.000133306  3485  D   W 544 + 128 [pdflush]
  8,0    1       32     0.000136043  3485  D   W 680 + 128 [pdflush]
  8,0    1       33     0.000140446  3485  D   W 816 + 128 [pdflush]
  8,0    1       34     0.000142961  3485  D   W 952 + 128 [pdflush]
  8,0    1       48     0.000204734  3485  D   W 1088 + 128 [pdflush]
  8,0    1       49     0.000207358  3485  D   W 1224 + 128 [pdflush]
  8,0    1       50     0.000209505  3485  D   W 1360 + 128 [pdflush]
  ....
 Completed request:
  8,0    0        1     0.045342874  3907  C   W 2856 + 128 [0]
  8,0    0        3     0.045374650  3907  C   W 2992 + 128 [0]
  8,0    0        5     0.057461715     0  C   W 1768 + 128 [0]
  8,0    0        7     0.057491967     0  C   W 1904 + 128 [0]
  8,0    0        9     0.060058695     0  C   W 680 + 128 [0]
  8,0    0       11     0.060075666     0  C   W 816 + 128 [0]
  8,0    0       13     0.063015540     0  C   W 1360 + 128 [0]
  8,0    0       15     0.063028859     0  C   W 1496 + 128 [0]
  8,0    0       17     0.073802939     0  C   W 3672 + 128 [0]
  8,0    0       19     0.073817422     0  C   W 3808 + 128 [0]
  8,0    0       21     0.075664013     0  C   W 544 + 128 [0]
  8,0    0       23     0.078348416     0  C   W 1088 + 128 [0]
  8,0    0       25     0.078362380     0  C   W 1224 + 128 [0]
  8,0    0       27     0.089371470     0  C   W 3400 + 128 [0]
  8,0    0       29     0.089385247     0  C   W 3536 + 128 [0]
  8,0    0       31     0.092328327     0  C   W 272 + 128 [0]
  ....
As you can see completion appears in semi-random order. This happens regardless
to enabled/disabled hardware write cache. So this is hardware crap.
Note: i've got same performance drawback for mac-mini with MAC Os.

Results for SCSI ( bio's size was restricted to 256 sectors):
All requests dispatched and completed in normal order, but by unknown
reason it takes more time to serve "write with holes" reqests.

Disk driver requests completions timeline comparison table
write(,, 32 *PG_SZ)       || write(,, 31*PG_SZ) ;lseek(,PG_SZ, SET_CUR)
--------------------------++---------------------------------
time       sector         ||    time      sector
--------------------------++---------------------------------   
0.001028   131072 + 96    ||  0.001020   131072 + 96 
0.010916   131176 + 256   ||  0.015471   131176 + 152
0.018810   131432 + 256   ||  0.022863   131336 + 248
0.020248   131688 + 256   ||  0.024771   131592 + 248
0.021674   131944 + 256   ||  0.031986   131848 + 248
0.023090   132200 + 256   ||  0.039276   132104 + 248
0.024575   132456 + 256   ||  0.046587   132360 + 248
0.026069   132712 + 256   ||  0.054503   132616 + 248
0.027566   132968 + 256   ||  0.061797   132872 + 248
0.029063   133224 + 256   ||  0.069087   133128 + 248
0.030558   133480 + 256   ||  0.076388   133384 + 248
0.032053   133736 + 256   ||  0.083756   133640 + 248
0.033544   133992 + 256   ||  0.085657   133896 + 248
0.035042   134248 + 256   ||  0.092878   134152 + 248
0.036518   134504 + 256   ||  0.100176   134408 + 248
0.038009   134760 + 256   ||  0.107473   134664 + 248
0.039510   135016 + 256   ||  0.115323   134920 + 248
0.041005   135272 + 256   ||  0.122638   135176 + 248
0.042500   135528 + 256   ||  0.129933   135432 + 248
0.043992   135784 + 256   ||  0.137224   135688 + 248
IMHO it is also hardware issue.

>
> -- 
> Jens Axboe
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux