Thanks a lot to Alan for this suggestion. I think it makes sense to simulate a scatter gather in driver for this case. I'll try it later and expect to see the improved performance. >-----Original Message----- >From: Alan Cox [mailto:alan@xxxxxxxxxxxxxxxxxxx] >Sent: 2010年4月13日 23:21 >To: Gao, Yunpeng >Cc: James Bottomley; Martin K. Petersen; Robert Hancock; >linux-ide@xxxxxxxxxxxxxxx; linux-mmc@xxxxxxxxxxxxxxx >Subject: Re: How to make kernel block layer generate bigger request in the >request queue? > >> And I just curious why the block layer does not merge these contiguous sectors >into one single request? For example, if > the block layer generate 'start_sect: >48776, nsect: 64, rw: r' instead of below requests, I think the performance will >> be better. > >You said earlier "My hardware doesn't support scatter/gather" > >> start_sect: 48776, nsect: 8, rw: r >> start_sect: 48784, nsect: 8, rw: r >> start_sect: 48792, nsect: 8, rw: r >> start_sect: 48800, nsect: 8, rw: r >> start_sect: 48808, nsect: 8, rw: r >> start_sect: 48816, nsect: 8, rw: r >> start_sect: 48824, nsect: 8, rw: r >> start_sect: 48832, nsect: 8, rw: r > >Print the bus address of each request and you will probably find they are >not contiguous so they have not been merged because your hardware could >not do that transfer and you have no IOMMU. > >If the overhead per command is really really huge you can preallocate an >internal buffer of say 32K or 64K in your driver and tell the block layer >you do scatter gather, then copy the buffers into a linear chunk. I'd be >very surprised if that was a win overall on any vaguely sane hardware but >flash with erase block overhead and the like might be one of the less >sane cases. > >Alan ?韬{.n?????%??檩??w?{.n???{炳i?)?骅w*jg????????G??⒏⒎?:+v????????????"??????