On 17-07-06 03:43 PM, Jason Dillaman wrote:
I've learned the hard way that pre-luminous, even if it copies the buffer,
it does so too late. In my specific case, my FUSE module does enter the
write call and issues rbd_aio_write there, then exits the write - expecting
the buffer provided by FUSE to be copied by librbd (as it happens now in
Luminous). I didn't expect that it's a new behavior and once my code was
deployed to use Jewel librbd, it started to consistently corrupt data during
write.
The correct (POSIX-style) program behavior should treat the buffer as
immutable until the IO operation completes. It is never safe to assume
the buffer can be re-used while the IO is in-flight. You should not
add any logic to assume the buffer is safely copied prior to the
completion of the IO.
Indeed, most systems - not only POSIX ones - supporting asynchronous writes
expect that buffer remain unchanged until the write is done. I wasn't sure
how rbd_aio_write operates and consulted the source, as there's no docs for
the api itself. That intermediate copy in librbd deceived me -- because if
librbd copies the data, why should I do the same before calling
rbd_aio_write? To stress-test memory bus? So I really see two problems here:
lack of API docs and backwards-incompatible change in API behavior.
--
Piotr Dałek
piotr.dalek@xxxxxxxxxxxx
https://www.ovh.com/us/
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html