CephFS overwrite/truncate performance hit

Hector Martin <hector@xxxxxxxxxxxxxx> · Thu, 7 Feb 2019 14:50:49 +0900

I'm seeing some interesting performance issues with file overwriting on 
CephFS.

Creating lots of files is fast:

for i in $(seq 1 1000); do
	echo $i; echo test > a.$i
done

Deleting lots of files is fast:

rm a.*

As is creating them again.

However, repeatedly creating the same file over and over again is slow:

for i in $(seq 1 1000); do
	echo $i; echo test > a
done

And it's still slow if the file is created with a new name and then 
moved over:

for i in $(seq 1 1000); do
	echo $i; echo test > a.$i; mv a.$i a
done

While appending to a single file is really fast:

for i in $(seq 1 1000); do
	echo $i; echo test >> a
done

As is repeatedly writing to offset 0:

for i in $(seq 1 1000); do
	echo $i; echo $RANDOM | dd of=a bs=128 conv=notrunc
done

But truncating the file first slows it back down again:

for i in $(seq 1 1000); do
	echo $i; truncate -s 0 a; echo test >> a
done

All of these things are reasonably fast on a local FS, of course. I'm 
using the kernel client (4.18) with Ceph 13.2.4, and the relevant CephFS 
data and metadata pools are rep-3 on HDDs. It seems to me that any 
operation that *reduces* a file's size for any given filename, or 
replaces it with another inode, has a large overhead.

I have an application that stores some flag data in a file, using the 
usual open/write/close/rename dance to atomically overwrite it, and this 
operation is currently the bottleneck (while doing a bunch of other 
processing on files on CephFS). I'm considering changing it to use a 
xattr to store the data instead, which seems like it should be atomic 
and performs a lot better:

for i in $(seq 1 1000); do
	echo $i; setfattr -n user.foo -v "test$RANDOM" a
done

Alternatively, is there a more CephFS-friendly atomic overwrite pattern 
than the usual open/write/close/rename? Can it e.g. guarantee that a 
write at offset 0 of less than the page size is atomic? I could easily 
make the writes equal-sized and thus avoid truncations and remove the 
rename dance, if I can guarantee they're atomic.

Is there any documentation on what write operations incur significant 
overhead on CephFS like this, and why? This particular issue isn't 
mentioned in http://docs.ceph.com/docs/master/cephfs/app-best-practices/ 
(which seems like it mostly deals with reads, not writes).

--
Hector Martin (hector@xxxxxxxxxxxxxx)
Public Key: https://mrcn.st/pub
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com