On Fri, Mar 01, 2024 at 08:18:24PM +0000, Donald Jennings wrote: > Hello all, > > We are using Ceph as the storage backend for some Cloud research which > involves offloading functions to storage nodes to benefit from > near-storage processing. We are using rados_exec to achieve this by > attempting to call a class method on the object which then executes > the function locally. However, we have been running into an issue > where rados_exec fails with EIO and the request is never reaching the > storage node with method never being called. > > Upon debugging this, I have noticed that if i re-put the same object > with a different key it works (provided it is on a different OSD). It > appears that the OSD cannot serve a rados_exec request. What's the simplest offload function you can reproduce the problem with, and can you share that? > This bug happens under a few conditions > > 1. If we invoke the function before uploading it > 2. Non-deterministically when the OSD is under load. > > I cannot seem to debug it for the life of me and only thing I have to > go on is the OSDs cannot serve requests. I have attempted to remove > the object from the pool and put it back with the same key and it does > the exact same thing. My initial read of this is that the content of the object is breaking your function? -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation President & Treasurer E-Mail : robbat2@xxxxxxxxxx GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx