Rogue EXDEV errors when hardlinking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all, we've been seeing persistent problems when trying to create hardlinks on cephfs; it's returning EXDEV in a way that makes no sense given typical POSIX behaviour and ceph documentation. Here's a typical strace of the problem:

        78    13:47:26.572435 link("/data/db/hdb/data/2023.08.06/table1.0/column1", "/data/db/hdb/data/2023.08.06/table1.1/column1") = -1 EXDEV (Invalid cross-device link)
        78    13:47:26.577661 write(1, "{\"time\":\"2025-03-03T13:47:26.577z\",\"component\":\"MSVC\",\"level\":\"INFO\",\"message\":\"[eoi-78] Retrying in 500 milliseconds\",\"service\":\"eoi\"}\n", 136) = 136
        78    13:47:26.577762 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=500000000}, NULL) = 0
        78    13:47:27.078037 link("/data/db/hdb/data/2023.08.06/table1.0/column1", "/data/db/hdb/data/2023.08.06/table1.1/column1") = 0

We try creating a link, get EXDEV, wait 500 milliseconds, then try the same operation again and it succeeds. The link and its target are both on the same cephfs mount (/data/db/hdb in this case), so the normal POSIX 'linking between filesystems' explanation doesn't apply.  I've looked through the ceph client and server code and from what I've seen EXDEV is only returned in a couple of other situations: linking between snapshots, and linking across quotas. Neither snapshots nor quotas were in use here, and if they were the culprit it seems unlikely the automatic retry would have worked. Web searches on EXDEV errors in ceph have also proven to be a dead end. My best guess, although it's not a very good one, is that stale MDS cache data is somehow involved -- in one case the issue reportedly got much worse after increasing (!) the MDS memory limit.

This error has been occurring for a particular client for upwards of 9 months and has proven stubbornly resistant to reproduction elsewhere (we are working on migrating them to a more recent ceph version to see if the error remains), so our technical investigations haven't got particularly far. I was hoping someone here on ceph-users would have seen similar EXDEV errors in the wild or in development and have some insight into what could be causing them.

Regards, Domhnall
***********************************************************************************************************************************************************************
This email, its contents and any files attached are a confidential communication and are intended only for the named addressees indicated in the message. If you are not the named addressee or if you have received this email in error, you may not, without the consent of KX, copy, use or rely on any information or attachments in any way. Please notify the sender by return email and delete it from your email system.
Unless separately agreed, KX does not accept any responsibility for the accuracy or completeness of the contents of this email or its attachments. Please note that any views, opinion or advice contained in this communication are those of the sending individual and not those of KX and KX shall have no liability whatsoever in relation to this communication (or its content) unless separately agreed.
***********************************************************************************************************************************************************************
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux