Re: [PATCH] ceph: fix error handling in ceph_sync_write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 8/25/22 9:16 PM, Jeff Layton wrote:
On Thu, 2022-08-25 at 06:56 -0400, Jeff Layton wrote:
On Thu, 2022-08-25 at 10:32 +0200, Ilya Dryomov wrote:
On Wed, Aug 24, 2022 at 10:53 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
ceph_sync_write has assumed that a zero result in req->r_result means
success. Testing with a recent cluster however shows the OSD returning
a non-zero length written here. I'm not sure whether and when this
changed, but fix the code to accept either result.

Assume a negative result means error, and anything else is a success. If
we're given a short length, then return a short write.

Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
---
  fs/ceph/file.c | 10 +++++++++-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 86265713a743..c0b2c8968be9 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
                                           req->r_end_latency, len, ret);
  out:
                 ceph_osdc_put_request(req);
-               if (ret != 0) {
+               if (ret < 0) {
                         ceph_set_error_write(ci);
                         break;
                 }

+               /*
+                * FIXME: it's unclear whether all OSD versions return the
+                * length written on a write. For now, assume that a 0 return
+                * means that everything got written.
+                */
+               if (ret && ret < len)
+                       len = ret;
+
                 ceph_clear_error_write(ci);
                 pos += len;
                 written += len;
--
2.37.2

Hi Jeff,

AFAIK OSDs aren't allowed to return any kind of length on a write
and there is no such thing as a short write.  This definitely needs
deeper investigation.

What is the cluster version you are testing against?

That's what I had thought too but I wasn't sure:

     [ceph: root@quad1 /]# ceph --version
     ceph version 17.0.0-14400-gf61b38dc (f61b38dc82e94f14e7a0a5f6a5888c0c78fafa6c) quincy (dev)

I'll see if I can confirm that this is coming from the OSD and not some
other layer as well.
My mistake. This bug turns out to be a different bug in the fscrypt
stack. We can drop this patch (and I probably should have sent it as an
RFC in the first place). Sorry for the noise!

Cool, thanks Jeff.

I saw you new update about this, they look good to me and will test them.

- Xiubo




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux