Linus Torvalds wrote:
IOW, when you allocate a new 32kB cluster, you will have to allocate 8
pages to do IO on it (since you'll have to initialize the diskspace), but
you can still literally treat those pages as _individual_ pages, and you
can write them out in any order, and you can free them (and then look them
up) one at a time.
Notice? The cluster size really only ends up being a disk-space allocation
issue, not an issue for actually caching the end result or for the actual
size of the IO.
Right.. I didn't realize we were actually that smart (not writing out
the entire cluster when dirtying one page) but I guess it makes sense.
The hardware sector size is very different. If you have a 32kB hardware
sector size, that implies that _all_ IO has to be done with that
granularity. Now you can no longer treat the eight pages as individual
pages - you _have_ to write them out and read them in as one entity. If
you dirty one page, you effectively dirty them all. You can not drop and
re-allocate pages one at a time any more.
Linus
I suspect that in this case trying to gang together multiple pages
inside the VM to actually handle it this way all the way through would
be insanity. My guess is the only way you could sanely do it is the
read-modify-write approach when writing out the data (in the block layer
maybe?) where the read can be optimized away if the pages for the entire
hardware sector are already in cache or the write is large enough to
replace the entire sector. I assume we already do this in the md code
somewhere for cases like software RAID 5 with a stripe size of >4KB..
That obviously would have some performance drawbacks compared to a
smaller sector size, but if the device is bound and determined to use
bigger sectors internally one way or the other and the alternative is
the drive does R-M-W internally to emulate smaller sectors - which for
some devices seems to be the case - maybe it makes more sense to do it
in the kernel if we have more information to allow us to do it more
efficiently. (Though, at least on the normal ATA disk side of things, 4K
is the biggest number I've heard tossed about for a future expanded
sector size, but flash devices like this may be another story..)
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html