On Fri, Sep 06 2024, Xiubo Li wrote: > On 9/5/24 21:57, Luis Henriques (SUSE) wrote: >> __ceph_sync_read() does not correctly handle reads when the inode size is >> zero. It is easy to hit a NULL pointer dereference by continuously reading >> a file while, on another client, we keep truncating and writing new data >> into it. >> >> The NULL pointer dereference happens when the inode size is zero but the >> read op returns some data (ceph_osdc_wait_request()). This will lead to >> 'left' being set to a huge value due to the overflow in: >> >> left = i_size - off; >> >> and, in the loop that follows, the pages[] array being accessed beyond >> num_pages. >> >> This patch fixes the issue simply by checking the inode size and returning >> if it is zero, even if there was data from the read op. >> >> Link: https://tracker.ceph.com/issues/67524 >> Fixes: 1065da21e5df ("ceph: stop copying to iter at EOF on sync reads") >> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@xxxxxxxxx> >> --- >> fs/ceph/file.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/fs/ceph/file.c b/fs/ceph/file.c >> index 4b8d59ebda00..41d4eac128bb 100644 >> --- a/fs/ceph/file.c >> +++ b/fs/ceph/file.c >> @@ -1066,7 +1066,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, >> if (ceph_inode_is_shutdown(inode)) >> return -EIO; >> - if (!len) >> + if (!len || !i_size) >> return 0; >> /* >> * flush any page cache pages in this range. this >> @@ -1154,6 +1154,9 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, >> doutc(cl, "%llu~%llu got %zd i_size %llu%s\n", off, len, >> ret, i_size, (more ? " MORE" : "")); >> + if (i_size == 0) >> + ret = 0; >> + >> /* Fix it to go to end of extent map */ >> if (sparse && ret >= 0) >> ret = ceph_sparse_ext_map_end(op); >> > Hi Luis, > > BTW, so in the following code: > > 1202 idx = 0; > 1203 if (ret <= 0) > 1204 left = 0; > 1205 else if (off + ret > i_size) > 1206 left = i_size - off; > 1207 else > 1208 left = ret; > > The 'ret' should be larger than '0', right ? Right. (Which means we read something from the file.) > If so we do not check anf fix it in the 'else if' branch instead? Yes, and then we'll have: left = i_size - off; and because 'i_size' is 0, so 'left' will be set to 0xffffffffff... And the loop that follows: while (left > 0) { ... } will keep looping until we get a NULL pointer. Have you tried the reproducer? Cheers, -- Luís > Because currently the read path code won't exit directly and keep retrying to > read if it found that the real content length is longer than the local 'i_size'. > > Again I am afraid your current fix will break the MIX filelock semantic ? > > Thanks > > - Xiubo >