> And I just curious why the block layer does not merge these contiguous sectors into one single request? For example, if > the block layer generate 'start_sect: 48776, nsect: 64, rw: r' instead of below requests, I think the performance will > be better. You said earlier "My hardware doesn't support scatter/gather" > start_sect: 48776, nsect: 8, rw: r > start_sect: 48784, nsect: 8, rw: r > start_sect: 48792, nsect: 8, rw: r > start_sect: 48800, nsect: 8, rw: r > start_sect: 48808, nsect: 8, rw: r > start_sect: 48816, nsect: 8, rw: r > start_sect: 48824, nsect: 8, rw: r > start_sect: 48832, nsect: 8, rw: r Print the bus address of each request and you will probably find they are not contiguous so they have not been merged because your hardware could not do that transfer and you have no IOMMU. If the overhead per command is really really huge you can preallocate an internal buffer of say 32K or 64K in your driver and tell the block layer you do scatter gather, then copy the buffers into a linear chunk. I'd be very surprised if that was a win overall on any vaguely sane hardware but flash with erase block overhead and the like might be one of the less sane cases. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html