On Mon, Jun 15, 2015 at 11:34 AM, negillen negillen <negillen@xxxxxxxxx> wrote: > Hello everyone, > > something very strange is driving me crazy with CephFS (kernel driver). > I copy a large directory on the CephFS from one node. If I try to perform a > 'time ls -alR' on that directory it gets executed in less than one second. > If I try to do the same 'time ls -alR' from another node it takes several > minutes. No matter how many times I repeat the command, the speed is always > abysmal. The ls works fine on the node where the initial copy was executed > from. This happens with any directory I have tried, no matter what kind of > data is inside. > > After lots of experimenting I have found that in order to have fast ls speed > for that dir from every node I need to flush the Linux cache on the original > node: > echo 3 > /proc/sys/vm/drop_caches > Unmounting and remounting the CephFS on that node does the trick too. > > Anyone has a clue about what's happening here? Could this be a problem with > the writeback fscache for the CephFS? > > Any help appreciated! Thanks and regards. :) This is a consequence of the CephFS "file capabilities" that we use to do distributed locking on file states. When you copy the directory on client A, it has full capabilities on the entire tree. When client B tries to do a stat on each file in that tree, it doesn't have any capabilities. So it sends a stat request to the MDS, which sends a cap update to client A requiring it to pause updates on the file and share its current state. Then the MDS tells client A it can keep going and sends the stat to client B. So that's: B -> MDS MDS -> A A -> MDS MDS -> B | MDS -> A for every file you touch. I think the particular oddity you're encountering here is that CephFS generally tries not to make clients drop their "exclusive" access caps just to satisfy a stat. If you had client B doing something with the files (like reading them) you would probably see different behavior. I'm not sure if there's something effective we can do here or not (it's just a bunch of heuristics when we should or should not drop caps), but please file a bug on the tracker (tracker.ceph.com) with this case. :) -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com