On Fri, Sep 06 2024, Xiubo Li wrote: > On 9/6/24 19:30, Luis Henriques wrote: >> On Fri, Sep 06 2024, Xiubo Li wrote: >> >>> On 9/5/24 21:57, Luis Henriques (SUSE) wrote: >>>> __ceph_sync_read() does not correctly handle reads when the inode size is >>>> zero. It is easy to hit a NULL pointer dereference by continuously reading >>>> a file while, on another client, we keep truncating and writing new data >>>> into it. >>>> >>>> The NULL pointer dereference happens when the inode size is zero but the >>>> read op returns some data (ceph_osdc_wait_request()). This will lead to >>>> 'left' being set to a huge value due to the overflow in: >>>> >>>> left = i_size - off; >>>> >>>> and, in the loop that follows, the pages[] array being accessed beyond >>>> num_pages. >>>> >>>> This patch fixes the issue simply by checking the inode size and returning >>>> if it is zero, even if there was data from the read op. >>>> >>>> Link: https://tracker.ceph.com/issues/67524 >>>> Fixes: 1065da21e5df ("ceph: stop copying to iter at EOF on sync reads") >>>> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@xxxxxxxxx> >>>> --- >>>> fs/ceph/file.c | 5 ++++- >>>> 1 file changed, 4 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c >>>> index 4b8d59ebda00..41d4eac128bb 100644 >>>> --- a/fs/ceph/file.c >>>> +++ b/fs/ceph/file.c >>>> @@ -1066,7 +1066,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, >>>> if (ceph_inode_is_shutdown(inode)) >>>> return -EIO; >>>> - if (!len) >>>> + if (!len || !i_size) >>>> return 0; >>>> /* >>>> * flush any page cache pages in this range. this >>>> @@ -1154,6 +1154,9 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos, >>>> doutc(cl, "%llu~%llu got %zd i_size %llu%s\n", off, len, >>>> ret, i_size, (more ? " MORE" : "")); >>>> + if (i_size == 0) >>>> + ret = 0; >>>> + >>>> /* Fix it to go to end of extent map */ >>>> if (sparse && ret >= 0) >>>> ret = ceph_sparse_ext_map_end(op); >>>> >>> Hi Luis, >>> >>> BTW, so in the following code: >>> >>> 1202 idx = 0; >>> 1203 if (ret <= 0) >>> 1204 left = 0; >>> 1205 else if (off + ret > i_size) >>> 1206 left = i_size - off; >>> 1207 else >>> 1208 left = ret; >>> >>> The 'ret' should be larger than '0', right ? >> Right. (Which means we read something from the file.) >> >>> If so we do not check anf fix it in the 'else if' branch instead? >> Yes, and then we'll have: >> >> left = i_size - off; >> >> and because 'i_size' is 0, so 'left' will be set to 0xffffffffff... >> And the loop that follows: >> >> while (left > 0) { >> ... >> } >> >> will keep looping until we get a NULL pointer. Have you tried the >> reproducer? > > Hi Luis, > > Not yet, and recently I haven't get a chance to do that for the reason as you > know. Hi Xiubo, I know you've been busy, but I was wondering if you (or someone else) had a chance to have a look at this. It's pretty easy to reproduce, and it has been seen in production. Any chances of getting some more feedback on this fix? Cheers, -- Luís