On Mon, Aug 05 2024, Xiubo Li wrote: > Hi Luis, > > Thanks for your reporting, BTW, could this be reproduceable ? > > This is also the first time I see this crash BUG. > > > The 'i_size == 0' could be easy to reproduce, please see my following debug > logs: > > ++++++++++++++++++++++++++++ > > ceph_read_iter: 0~1024 trying to get caps on 000000006a438277 > 100000001f7.fffffffffffffffe > try_get_cap_refs: 000000006a438277 100000001f7.fffffffffffffffe need Fr want Fc > __ceph_caps_issued: 000000006a438277 100000001f7.fffffffffffffffe cap > 000000001a8b6d16 issued pAsLsXsFrw > try_get_cap_refs: 000000006a438277 100000001f7.fffffffffffffffe have pAsLsXsFrw > but not Fc (revoking -) > try_get_cap_refs: 000000006a438277 100000001f7.fffffffffffffffe ret 1 got Fr > ceph_read_iter: sync 000000006a438277 100000001f7.fffffffffffffffe 0~1024 got > cap refs on Fr > ceph_sync_read: on file 00000000e029b65e 0~400 > __ceph_sync_read: on inode 000000006a438277 100000001f7.fffffffffffffffe 0~400 > __ceph_sync_read: orig 0~1024 reading 0~1024 > __ceph_sync_read: 0~1024 got -2 i_size 0 > __ceph_sync_read: result 0 retry_op 0 > ceph_read_iter: 000000006a438277 100000001f7.fffffffffffffffe dropping cap refs > on Fr = 0 > __ceph_put_cap_refs: 000000006a438277 100000001f7.fffffffffffffffe had Fr last > __ceph_caps_issued: 000000006a438277 100000001f7.fffffffffffffffe cap > 000000001a8b6d16 issued pAsLsXsFrw > +++++++++++++++++++++++++++++++++ > > I just created one empty file and then in Client.A open it for r/w, and then > open the same file in Client.B and did a simple read. > > Currently ceph kclient won't check the 'i_size' before sending out the sync read > request to Rados, but will do it after it getting the contents back, As I > remembered this logic comply to the "MIX" filelock state in MDS: > > [LOCK_MIX] = { 0, false, LOCK_MIX, 0, 0, REQ, ANY, 0, 0, > 0, CEPH_CAP_GRD|CEPH_CAP_GWR|CEPH_CAP_GLAZYIO,0,0,CEPH_CAP_GRD }, > > You can raise one ceph tracker for this. I'll do that, and thanks for analysis. I'll need to catch-up with a few things first after being a week offline, but I'll get back to this bug shortly. Cheers, -- Luís > > Thanks > > - Xiubo > > On 8/3/24 00:39, Luis Henriques wrote: >> Hi Xiubo, >> >> I was wondering if you ever seen the BUG below. I've debugged it a bit >> and the issue seems occurs here, while doing the SetPageUptodate(): >> >> if (ret <= 0) >> left = 0; >> else if (off + ret > i_size) >> left = i_size - off; >> else >> left = ret; >> while (left > 0) { >> size_t plen, copied; >> >> plen = min_t(size_t, left, PAGE_SIZE - page_off); >> SetPageUptodate(pages[idx]); >> copied = copy_page_to_iter(pages[idx++], >> page_off, plen, to); >> off += copied; >> left -= copied; >> page_off = 0; >> if (copied < plen) { >> ret = -EFAULT; >> break; >> } >> } >> >> So, the issue is that we have idx > num_pages. And I'm almost sure that's >> because of i_size being '0' and 'left' ending up with a huge value. But >> haven't managed to figure out yet why i_size is '0'. >> >> (Note: I'll be offline next week, but I'll continue looking into this the >> week after. But I figured I should report the bug anyway, in case you've >> seen something similar.) >> >> Cheers, >