Re: [Lsf-pc] [LSF/MM TOPIC] end-to-end data and metadata corruption detection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/02/2012 08:26 PM, Andrea Arcangeli wrote:
On Thu, Feb 02, 2012 at 10:04:59AM +0100, Bernd Schubert wrote:
I think the point for network file systems is that they can reuse the
disk-checksum for network verification. So instead of calculating a
checksum for network and disk, just use one for both. The checksum also
is supposed to be cached in memory, as that avoids re-calculation for
other clients.

1)
client-1: sends data and checksum

server: Receives those data and verifies the checksum ->  network
transfer was ok, sends data and checksum to disk

2)
client-2 ... client-N: Ask for those data

server:  send cached data and cached checksum

client-2 ... client-N: Receive data and verify checksum


So the hole point of caching checksums is to avoid the server needs to
recalculate those for dozens of clients. Recalculating checksums simply
does not scale with an increasing number of clients, which want to read
data processed by another client.

This makes sense indeed. My argument was only about the exposure of
the storage hw format cksum to userland (through some new ioctl for
further userland verification of the pagecache data in the client
pagecache, done by whatever program is reading from the cache). The
network fs client lives in kernel, the network fs server lives in
kernel, so no need to expose the cksum to userland to do what you
described above.

I meant if we can't trust the pagecache to be correct (after the
network fs client code already checked the cksum cached by the server
and sent to the client along the server cached data), I don't see much
value added through a further verification by the userland program
running on the client and accessing pagecache in the client. If we
can't trust client pagecache to be safe against memory bitflips or
software bugs, we can hardly trust the anonymous memory too.

Well, now it gets a bit troublesome - not all file systems are in kernel space. FhGFS uses kernel clients, but has user space daemons. I think Ceph does it similarly. And although I'm not sure about the roadmap of Gluster and if data verification is planned at all, but if it would like to do that, even the clients would need get access to the checksums in user space.

Now lets assume we ignore user space clients for now, what about using the splice interface to also send checksums? So as basic concept file systems servers are not interested at all about the real data, but only do the management between disk and network. So a possible solution to not expose checksums to user space daemons is to simply not expose data to the servers at all. However, in that case the server side kernel would need to do the checksum verification, so even for user space daemons. Remaining issue with splice is that splice does not work with inifiniband-ibverbs due to the missing socket fd.

Another solution that also might work is to expose checksums only read-only to user space.



Cheers,
Bernd
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux