On 02/01/2012 07:30 PM, Andrea Arcangeli wrote:
On Wed, Feb 01, 2012 at 12:16:05PM -0600, James Bottomley wrote:
supplying protection information to user space isn't about the
application checking what's on disk .. there's automatic verification in
the chain to do that (both the HBA and the disk will check the
protection information on entry/exit and transfer). Supplying
protection information to userspace is about checking nothing went wrong
in the handoff between the end of the DIF stack and the application.
Not sure if I got this right, but keeping protection information for
in-ram pagecache and exposing it to userland somehow, to me sounds a
bit of overkill as a concept. Then you should want that for anonymous
memory too. If you copy the pagecache to a malloc()ed buffer and
verify pagecache was consistent, but then the buffer is corrupt by
hardware bitflip or software bug, then what's the point. Besides if
this is getting exposed to userland and it's not hidden in the kernel
(FS/Storage layers), userland could code its own verification logic
without much added complexity. With CRC in hardware on the CPU it
doesn't sound like a big cost to do it fully in userland and then you
could run it on anonymous memory too if you need and not be dependent
on hardware or filesystem details (well other than a a cpuid check at
startup).
I think the point for network file systems is that they can reuse the
disk-checksum for network verification. So instead of calculating a
checksum for network and disk, just use one for both. The checksum also
is supposed to be cached in memory, as that avoids re-calculation for
other clients.
1)
client-1: sends data and checksum
server: Receives those data and verifies the checksum -> network
transfer was ok, sends data and checksum to disk
2)
client-2 ... client-N: Ask for those data
server: send cached data and cached checksum
client-2 ... client-N: Receive data and verify checksum
So the hole point of caching checksums is to avoid the server needs to
recalculate those for dozens of clients. Recalculating checksums simply
does not scale with an increasing number of clients, which want to read
data processed by another client.
Cheers,
Bernd
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html