On Sat, Mar 7, 2020 at 9:45 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > > On Sat, Mar 7, 2020 at 12:39 PM Willem Jan Withagen <wjw@xxxxxxxxxxx> wrote: > > > > On 7-3-2020 19:13, Gregory Farnum wrote: > > > On Fri, Mar 6, 2020 at 8:04 AM Willem Jan Withagen <wjw@xxxxxxxxxxx> wrote: > > >> Hoi, > > >> > > >> When writting a Rados/Rbd client I noticed that the sequence: > > >> > > >> r = rbd_aio_readv(ri->ri_image, iov, iovcnt, offset, comp); > > >> r = rbd_aio_wait_for_complete(comp); > > >> nbytes = rbd_aio_get_return_value(comp); > > >> > > >> returns the number of bytes read in nbytes. > > >> > > >> Now if I do the same but with `rbd_aio_write(v)` I only receive > > >> zero as value... > > >> > > >> I would have expected that rbd_aio_get_return_value would more > > >> or less function like aio_return(2): > > >> DESCRIPTION > > >> The aio_return() system call returns the final status of the > > >> asynchronous > > >> I/O request associated with the structure pointed to by iocb. > > >> RETURN VALUES > > >> If the asynchronous I/O request has completed, the status is > > >> returned as > > >> described in read(2), write(2), or fsync(2). Otherwise, aio_return() > > >> returns -1 and sets errno to indicate the error condition. > > >> > > >> So I was expecting the amount of bytes written? > > > Understandable, but for "distributed systems are hard" reasons, writes > > > return either 0 (success) or -ERRNO. [1] I guess librbd could > > > translate that back into the given write size, but it's never come up > > > as an issue and might have some weird edge cases? > > > -Greg > > > > > > [1]: Returning anything else requires recording the return value as an > > > additional write, or part of the write, so that we give out the same > > > answer on replay of the write request. We've been talking for years > > > about extending the protocol+implementation to allow sending back 32 > > > or 64 bytes of data from a write, and...I can't recall if one of those > > > times it finally got done. > > Hi Greg, > > > > Thanx for the info. > > > > I can count what needs to be written from the elements in iov[]. So that > > I can fix > > in my code. But if -ERRNO is returned could there be a partial write? > > If RADOS returns an error code, it won't commit anything. > > I'm not sure if librbd can issue single writes which turn into > multiple RADOS writes (by involving multiple objects); I suspect in a > sane configuration it can't but maybe you could coerce it with weird > stripe parameters or block sizes...Jason? It can. Any librbd op that straddles objects would turn into multiple RADOS ops. This can happen with both small (e.g. 4K) and big (e.g. 4M) writes even with the default striping parameters. With clones, the write can be preceded by a bunch of reads from a parent image and a bunch of copyups. With exclusive-lock, by grabbing the lock (potentially requesting it from a peer). With object-map, by updating the object map (potentially twice, before and after the write-like op, such as discard). The list goes on... So a partial write can happen in the sense that some parts of your request can get committed (overwriting old data) even if an error is returned, and not as in "this many bytes from the beginning of the request were committed". The entire write must be retried on error. Thanks, Ilya _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx