Re: CephFS unexplained writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It does sound contradictory: why would read operations in cephfs result
in writes to disk? But they do. I upgraded to Hammer last week and I am
still seeing this.

The setup is as follows:

EC-pool on hdd's for data
replicated pool on ssd's for data-cache
replicated pool on ssd's for meta-data

Now whenever I start doing heavy reads on cephfs, I see intense bursts
of write operations on the hdd's. The reads I'm doing are things like
reading a large file (streaming a video), or running a big rsync job
with --dry-run (so it just checks meta-data). No clue why that would
have any effect on the hdd's, but it does.

Now, to further figure out what's going on, I tried using lsof, atop,
iotop, but those tools don't provide the necessary information. In lsof
I just see a whole bunch of files opened at any time, but it doesn't
change much during these tests.
In atop and iotop I can clearly see that the hdd's are doing a lot of
writes when I'm reading in cephfs, but those tools can't tell me what
those writes are.

So I tried strace, which can trace file operations and attach to running
processes.
# strace -f -e trace=file -p 5076
This gave me an idea of what was going on. 5076 is the process id of the
osd for one of the hdd's. I saw mostly stat's and open's, but those are
all reads, not writes. Of course btrfs can cause writes when doing reads
(atime), but I have the osd mounted with noatime.
The only write operations that I saw a lot of are these:

[pid  5350]
getxattr("/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3",
"user.cephos.phash.contents", "\1Q\0\0\0\0\0\0\0\0\0\0\0\4\0\0", 1024) = 17
[pid  5350]
setxattr("/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3",
"user.cephos.phash.contents", "\1R\0\0\0\0\0\0\0\0\0\0\0\4\0\0", 17, 0) = 0
[pid  5350]
removexattr("/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3",
"user.cephos.phash.contents@1") = -1 ENODATA (No data available)

So it appears that the osd's aren't writing actual data to disk, but
metadata in the form of xattr's. Can anyone explain what this setting
and removing of xattr's could be for?

Kind regards,

Erik.


On 03/16/2015 10:44 PM, Gregory Farnum wrote:
> The information you're giving sounds a little contradictory, but my
> guess is that you're seeing the impacts of object promotion and
> flushing. You can sample the operations the OSDs are doing at any
> given time by running ops_in_progress (or similar, I forget exact
> phrasing) command on the OSD admin socket. I'm not sure if "rados df"
> is going to report cache movement activity or not.
> 
> That though would mostly be written to the SSDs, not the hard drives —
> although the hard drives could still get metadata updates written when
> objects are flushed. What data exactly are you seeing that's leading
> you to believe writes are happening against these drives? What is the
> exact CephFS and cache pool configuration?
> -Greg
> 
> On Mon, Mar 16, 2015 at 2:36 PM, Erik Logtenberg <erik@xxxxxxxxxxxxx> wrote:
>> Hi,
>>
>> I forgot to mention: while I am seeing these writes in iotop and
>> /proc/diskstats for the hdd's, I am -not- seeing any writes in "rados
>> df" for the pool residing on these disks. There is only one pool active
>> on the hdd's and according to rados df it is getting zero writes when
>> I'm just reading big files from cephfs.
>>
>> So apparently the osd's are doing some non-trivial amount of writing on
>> their own behalf. What could it be?
>>
>> Thanks,
>>
>> Erik.
>>
>>
>> On 03/16/2015 10:26 PM, Erik Logtenberg wrote:
>>> Hi,
>>>
>>> I am getting relatively bad performance from cephfs. I use a replicated
>>> cache pool on ssd in front of an erasure coded pool on rotating media.
>>>
>>> When reading big files (streaming video), I see a lot of disk i/o,
>>> especially writes. I have no clue what could cause these writes. The
>>> writes are going to the hdd's and they stop when I stop reading.
>>>
>>> I mounted everything with noatime and nodiratime so it shouldn't be
>>> that. On a related note, the Cephfs metadata is stored on ssd too, so
>>> metadata-related changes shouldn't hit the hdd's anyway I think.
>>>
>>> Any thoughts? How can I get more information about what ceph is doing?
>>> Using iotop I only see that the osd processes are busy but it doesn't
>>> give many hints as to what they are doing.
>>>
>>> Thanks,
>>>
>>> Erik.
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux