Thanks again,
even 'du' performance is terrible on node B (testing on a directory taken from Phoronix):
# time du -hs /storage/test9/installed-tests/pts/pgbench-1.5.1/
73M /storage/test9/installed-tests/pts/pgbench-1.5.1/
real 0m21.044s
user 0m0.010s
sys 0m0.067s
Reading the files from node B doesn't seem to help with subsequent accesses in this case:
# time tar c /storage/test9/installed-tests/pts/pgbench-1.5.1/>/dev/null
real 1m47.650s
user 0m0.041s
sys 0m0.212s
# time tar c /storage/test9/installed-tests/pts/pgbench-1.5.1/>/dev/null
real 1m45.636s
user 0m0.042s
sys 0m0.214s
# time ls -laR /storage/test9/installed-tests/pts/pgbench-1.5.1>/dev/null
real 1m43.180s
user 0m0.069s
sys 0m0.236s
Of course, once I dismount the CephFS on node A everything gets as fast as it can be.
Am I missing something obvious here?
Yes I could drop the Linux cache as a 'fix' but that would drop the entire system's cache, sounds a bit extreme! :P
Unless is there a way to drop the cache only for that single dir...?even 'du' performance is terrible on node B (testing on a directory taken from Phoronix):
# time du -hs /storage/test9/installed-tests/pts/pgbench-1.5.1/
73M /storage/test9/installed-tests/pts/pgbench-1.5.1/
real 0m21.044s
user 0m0.010s
sys 0m0.067s
Reading the files from node B doesn't seem to help with subsequent accesses in this case:
# time tar c /storage/test9/installed-tests/pts/pgbench-1.5.1/>/dev/null
real 1m47.650s
user 0m0.041s
sys 0m0.212s
# time tar c /storage/test9/installed-tests/pts/pgbench-1.5.1/>/dev/null
real 1m45.636s
user 0m0.042s
sys 0m0.214s
# time ls -laR /storage/test9/installed-tests/pts/pgbench-1.5.1>/dev/null
real 1m43.180s
user 0m0.069s
sys 0m0.236s
Of course, once I dismount the CephFS on node A everything gets as fast as it can be.
Am I missing something obvious here?
Yes I could drop the Linux cache as a 'fix' but that would drop the entire system's cache, sounds a bit extreme! :P
On Tue, Jun 16, 2015 at 12:15 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
On Tue, Jun 16, 2015 at 12:11 PM, negillen negillen <negillen@xxxxxxxxx> wrote:
> Thank you very much for your reply!
>
> Is there anything I can do to go around that? e.g. setting access caps to be
> released after a short while? Or is there a command to manually release
> access caps (so that I could run it in cron)?
Well, you can drop the caches. ;)
More generally, you're running into a specific hole here. If your
clients are actually *accessing* the files then they should go into
shared mode and this will be much faster on subsequent accesses.
> This is quite a problem because we have several applications that need to
> access a large number of files and when we set them to work on CephFS
> latency skyrockets.
What kind of shared-file access do they have? If you have a bunch of
files being shared for read I'd expect this to be very fast. If
different clients are writing small amounts to them in round-robin
then that's unfortunately not going to work well. :(
-Greg
>
> Thanks again and regards.
>
> On Tue, Jun 16, 2015 at 10:59 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>
>> On Mon, Jun 15, 2015 at 11:34 AM, negillen negillen <negillen@xxxxxxxxx>
>> wrote:
>> > Hello everyone,
>> >
>> > something very strange is driving me crazy with CephFS (kernel driver).
>> > I copy a large directory on the CephFS from one node. If I try to
>> > perform a
>> > 'time ls -alR' on that directory it gets executed in less than one
>> > second.
>> > If I try to do the same 'time ls -alR' from another node it takes
>> > several
>> > minutes. No matter how many times I repeat the command, the speed is
>> > always
>> > abysmal. The ls works fine on the node where the initial copy was
>> > executed
>> > from. This happens with any directory I have tried, no matter what kind
>> > of
>> > data is inside.
>> >
>> > After lots of experimenting I have found that in order to have fast ls
>> > speed
>> > for that dir from every node I need to flush the Linux cache on the
>> > original
>> > node:
>> > echo 3 > /proc/sys/vm/drop_caches
>> > Unmounting and remounting the CephFS on that node does the trick too.
>> >
>> > Anyone has a clue about what's happening here? Could this be a problem
>> > with
>> > the writeback fscache for the CephFS?
>> >
>> > Any help appreciated! Thanks and regards. :)
>>
>> This is a consequence of the CephFS "file capabilities" that we use to
>> do distributed locking on file states. When you copy the directory on
>> client A, it has full capabilities on the entire tree. When client B
>> tries to do a stat on each file in that tree, it doesn't have any
>> capabilities. So it sends a stat request to the MDS, which sends a cap
>> update to client A requiring it to pause updates on the file and share
>> its current state. Then the MDS tells client A it can keep going and
>> sends the stat to client B.
>> So that's:
>> B -> MDS
>> MDS -> A
>> A -> MDS
>> MDS -> B | MDS -> A
>> for every file you touch.
>>
>> I think the particular oddity you're encountering here is that CephFS
>> generally tries not to make clients drop their "exclusive" access caps
>> just to satisfy a stat. If you had client B doing something with the
>> files (like reading them) you would probably see different behavior.
>> I'm not sure if there's something effective we can do here or not
>> (it's just a bunch of heuristics when we should or should not drop
>> caps), but please file a bug on the tracker (tracker.ceph.com) with
>> this case. :)
>> -Greg
>
>
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com