Hi Luis,
Thanks for your reporting, BTW, could this be reproduceable ?
This is also the first time I see this crash BUG.
The 'i_size == 0' could be easy to reproduce, please see my following
debug logs:
++++++++++++++++++++++++++++
ceph_read_iter: 0~1024 trying to get caps on 000000006a438277
100000001f7.fffffffffffffffe
try_get_cap_refs: 000000006a438277 100000001f7.fffffffffffffffe need
Fr want Fc
__ceph_caps_issued: 000000006a438277 100000001f7.fffffffffffffffe cap
000000001a8b6d16 issued pAsLsXsFrw
try_get_cap_refs: 000000006a438277 100000001f7.fffffffffffffffe have
pAsLsXsFrw but not Fc (revoking -)
try_get_cap_refs: 000000006a438277 100000001f7.fffffffffffffffe ret 1
got Fr
ceph_read_iter: sync 000000006a438277 100000001f7.fffffffffffffffe
0~1024 got cap refs on Fr
ceph_sync_read: on file 00000000e029b65e 0~400
__ceph_sync_read: on inode 000000006a438277
100000001f7.fffffffffffffffe 0~400
__ceph_sync_read: orig 0~1024 reading 0~1024
__ceph_sync_read: 0~1024 got -2 i_size 0
__ceph_sync_read: result 0 retry_op 0
ceph_read_iter: 000000006a438277 100000001f7.fffffffffffffffe dropping
cap refs on Fr = 0
__ceph_put_cap_refs: 000000006a438277 100000001f7.fffffffffffffffe had
Fr last
__ceph_caps_issued: 000000006a438277 100000001f7.fffffffffffffffe cap
000000001a8b6d16 issued pAsLsXsFrw
+++++++++++++++++++++++++++++++++
I just created one empty file and then in Client.A open it for r/w, and
then open the same file in Client.B and did a simple read.
Currently ceph kclient won't check the 'i_size' before sending out the
sync read request to Rados, but will do it after it getting the contents
back, As I remembered this logic comply to the "MIX" filelock state in MDS:
[LOCK_MIX] = { 0, false, LOCK_MIX, 0, 0, REQ, ANY,
0, 0, 0, CEPH_CAP_GRD|CEPH_CAP_GWR|CEPH_CAP_GLAZYIO,0,0,CEPH_CAP_GRD },
You can raise one ceph tracker for this.
Thanks
- Xiubo
On 8/3/24 00:39, Luis Henriques wrote:
Hi Xiubo,
I was wondering if you ever seen the BUG below. I've debugged it a bit
and the issue seems occurs here, while doing the SetPageUptodate():
if (ret <= 0)
left = 0;
else if (off + ret > i_size)
left = i_size - off;
else
left = ret;
while (left > 0) {
size_t plen, copied;
plen = min_t(size_t, left, PAGE_SIZE - page_off);
SetPageUptodate(pages[idx]);
copied = copy_page_to_iter(pages[idx++],
page_off, plen, to);
off += copied;
left -= copied;
page_off = 0;
if (copied < plen) {
ret = -EFAULT;
break;
}
}
So, the issue is that we have idx > num_pages. And I'm almost sure that's
because of i_size being '0' and 'left' ending up with a huge value. But
haven't managed to figure out yet why i_size is '0'.
(Note: I'll be offline next week, but I'll continue looking into this the
week after. But I figured I should report the bug anyway, in case you've
seen something similar.)
Cheers,