Is this difference not related to chaching? And you filling up some cache/queue at some point? If you do a sync after each write, do you have still the same results? -----Original Message----- From: Hector Martin [mailto:hector@xxxxxxxxxxxxxx] Sent: 07 February 2019 06:51 To: ceph-users@xxxxxxxxxxxxxx Subject: CephFS overwrite/truncate performance hit I'm seeing some interesting performance issues with file overwriting on CephFS. Creating lots of files is fast: for i in $(seq 1 1000); do echo $i; echo test > a.$i done Deleting lots of files is fast: rm a.* As is creating them again. However, repeatedly creating the same file over and over again is slow: for i in $(seq 1 1000); do echo $i; echo test > a done And it's still slow if the file is created with a new name and then moved over: for i in $(seq 1 1000); do echo $i; echo test > a.$i; mv a.$i a done While appending to a single file is really fast: for i in $(seq 1 1000); do echo $i; echo test >> a done As is repeatedly writing to offset 0: for i in $(seq 1 1000); do echo $i; echo $RANDOM | dd of=a bs=128 conv=notrunc done But truncating the file first slows it back down again: for i in $(seq 1 1000); do echo $i; truncate -s 0 a; echo test >> a done All of these things are reasonably fast on a local FS, of course. I'm using the kernel client (4.18) with Ceph 13.2.4, and the relevant CephFS data and metadata pools are rep-3 on HDDs. It seems to me that any operation that *reduces* a file's size for any given filename, or replaces it with another inode, has a large overhead. I have an application that stores some flag data in a file, using the usual open/write/close/rename dance to atomically overwrite it, and this operation is currently the bottleneck (while doing a bunch of other processing on files on CephFS). I'm considering changing it to use a xattr to store the data instead, which seems like it should be atomic and performs a lot better: for i in $(seq 1 1000); do echo $i; setfattr -n user.foo -v "test$RANDOM" a done Alternatively, is there a more CephFS-friendly atomic overwrite pattern than the usual open/write/close/rename? Can it e.g. guarantee that a write at offset 0 of less than the page size is atomic? I could easily make the writes equal-sized and thus avoid truncations and remove the rename dance, if I can guarantee they're atomic. Is there any documentation on what write operations incur significant overhead on CephFS like this, and why? This particular issue isn't mentioned in http://docs.ceph.com/docs/master/cephfs/app-best-practices/ (which seems like it mostly deals with reads, not writes). -- Hector Martin (hector@xxxxxxxxxxxxxx) Public Key: https://mrcn.st/pub _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com