I didn't discard it though :). I folded it into the `if` statement. I find the if else construct overly verbose and cumbersome. + left = (ret > 0) ? ret : 0; On Thu, Nov 28, 2024 at 7:43 PM Luis Henriques <luis.henriques@xxxxxxxxx> wrote: > > Hi Alex, > > [ Thank you for looking into this. ] > > On Wed, Nov 27 2024, Alex Markuze wrote: > > > Hi, Folks. > > AFAIK there is no side effect that can affect MDS with this fix. > > This crash happens following this patch > > "1065da21e5df9d843d2c5165d5d576be000142a6" "ceph: stop copying to iter > > at EOF on sync reads". > > > > Per your fix Luis, it seems to address only the cases when i_size goes > > to zero but can happen anytime the `i_size` goes below `off`. > > I propose fixing it this way: > > Hmm... you're probably right. I didn't see this happening, but I guess it > could indeed happen. > > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > > index 4b8d59ebda00..19b084212fee 100644 > > --- a/fs/ceph/file.c > > +++ b/fs/ceph/file.c > > @@ -1066,7 +1066,7 @@ ssize_t __ceph_sync_read(struct inode *inode, > > loff_t *ki_pos, > > if (ceph_inode_is_shutdown(inode)) > > return -EIO; > > > > - if (!len) > > + if (!len || !i_size) > > return 0; > > /* > > * flush any page cache pages in this range. this > > @@ -1200,12 +1200,11 @@ ssize_t __ceph_sync_read(struct inode *inode, > > loff_t *ki_pos, > > } > > > > idx = 0; > > - if (ret <= 0) > > - left = 0; > > Right now I don't have any means for testing this patch. However, I don't > think this is completely correct. By removing the above condition you're > discarding cases where an error has occurred (i.e. where ret is negative). > > Why not simply modify my patch and do: > > if (i_size < off) > ret = 0; > > instead of: > if (i_size == 0) > ret = 0; > > ? > > (Again, totally untested!) > > Cheers, > -- > Luís > > > - else if (off + ret > i_size) > > - left = i_size - off; > > + if (off + ret > i_size) > > + left = (i_size > off) ? i_size - off : 0; > > else > > - left = ret; > > + left = (ret > 0) ? ret : 0; > > + > > while (left > 0) { > > size_t plen, copied; > > > > > > On Thu, Nov 7, 2024 at 1:09 PM Luis Henriques <luis.henriques@xxxxxxxxx> wrote: > >> > >> (CC'ing Alex) > >> > >> On Wed, Nov 06 2024, Goldwyn Rodrigues wrote: > >> > >> > Hi Xiubo, > >> > > >> >> BTW, so in the following code: > >> >> > >> >> 1202 idx = 0; > >> >> 1203 if (ret <= 0) > >> >> 1204 left = 0; > >> >> 1205 else if (off + ret > i_size) > >> >> 1206 left = i_size - off; > >> >> 1207 else > >> >> 1208 left = ret; > >> >> > >> >> The 'ret' should be larger than '0', right ? > >> >> > >> >> If so we do not check anf fix it in the 'else if' branch instead? > >> >> > >> >> Because currently the read path code won't exit directly and keep > >> >> retrying to read if it found that the real content length is longer than > >> >> the local 'i_size'. > >> >> > >> >> Again I am afraid your current fix will break the MIX filelock semantic ? > >> > > >> > Do you think changing left to ssize_t instead of size_t will > >> > fix the problem? > >> > > >> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > >> > index 4b8d59ebda00..f8955773bdd7 100644 > >> > --- a/fs/ceph/file.c > >> > +++ b/fs/ceph/file.c > >> > @@ -1066,7 +1066,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, > >> > if (ceph_inode_is_shutdown(inode)) > >> > return -EIO; > >> > > >> > - if (!len) > >> > + if (!len || !i_size) > >> > return 0; > >> > /* > >> > * flush any page cache pages in this range. this > >> > @@ -1087,7 +1087,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, > >> > size_t page_off; > >> > bool more; > >> > int idx; > >> > - size_t left; > >> > + ssize_t left; > >> > struct ceph_osd_req_op *op; > >> > u64 read_off = off; > >> > u64 read_len = len; > >> > > >> > >> I *think* (although I haven't tested it) that you're patch should work as > >> well. But I also think it's a bit more hacky: the overflow will still be > >> there: > >> > >> if (ret <= 0) > >> left = 0; > >> else if (off + ret > i_size) > >> left = i_size - off; > >> else > >> left = ret; > >> while (left > 0) { > >> // ... > >> } > >> > >> If 'i_size' is '0', 'left' (which is now signed) will now have a negative > >> value in the 'else if' branch and the loop that follows will not be > >> executed. My version will simply set 'ret' to '0' before this 'if' > >> construct. > >> > >> So, in my opinion, what needs to be figured out is whether this will cause > >> problems on the MDS side or not. Because on the kernel client, it should > >> be safe to ignore reads to an inode that has size set to '0', even if > >> there's already data available to be read. Eventually, the inode metadata > >> will get updated and by then we can retry the read. > >> > >> Unfortunately, the MDS continues to be a huge black box for me and the > >> locking code in particular is very tricky. I'd rather defer this for > >> anyone that is familiar with the code. > >> > >> Cheers, > >> -- > >> Luís > >> >