Sam? This looks to be the HashIndex::SUBDIR_ATTR, but I don't know exactly what it's for nor why it would be getting constantly created and removed on a pure read workload... On Thu, May 7, 2015 at 2:55 PM, Erik Logtenberg <erik@xxxxxxxxxxxxx> wrote: > It does sound contradictory: why would read operations in cephfs result > in writes to disk? But they do. I upgraded to Hammer last week and I am > still seeing this. > > The setup is as follows: > > EC-pool on hdd's for data > replicated pool on ssd's for data-cache > replicated pool on ssd's for meta-data > > Now whenever I start doing heavy reads on cephfs, I see intense bursts > of write operations on the hdd's. The reads I'm doing are things like > reading a large file (streaming a video), or running a big rsync job > with --dry-run (so it just checks meta-data). No clue why that would > have any effect on the hdd's, but it does. > > Now, to further figure out what's going on, I tried using lsof, atop, > iotop, but those tools don't provide the necessary information. In lsof > I just see a whole bunch of files opened at any time, but it doesn't > change much during these tests. > In atop and iotop I can clearly see that the hdd's are doing a lot of > writes when I'm reading in cephfs, but those tools can't tell me what > those writes are. > > So I tried strace, which can trace file operations and attach to running > processes. > # strace -f -e trace=file -p 5076 > This gave me an idea of what was going on. 5076 is the process id of the > osd for one of the hdd's. I saw mostly stat's and open's, but those are > all reads, not writes. Of course btrfs can cause writes when doing reads > (atime), but I have the osd mounted with noatime. > The only write operations that I saw a lot of are these: > > [pid 5350] > getxattr("/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3", > "user.cephos.phash.contents", "\1Q\0\0\0\0\0\0\0\0\0\0\0\4\0\0", 1024) = 17 > [pid 5350] > setxattr("/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3", > "user.cephos.phash.contents", "\1R\0\0\0\0\0\0\0\0\0\0\0\4\0\0", 17, 0) = 0 > [pid 5350] > removexattr("/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3", > "user.cephos.phash.contents@1") = -1 ENODATA (No data available) > > So it appears that the osd's aren't writing actual data to disk, but > metadata in the form of xattr's. Can anyone explain what this setting > and removing of xattr's could be for? > > Kind regards, > > Erik. > > > On 03/16/2015 10:44 PM, Gregory Farnum wrote: >> The information you're giving sounds a little contradictory, but my >> guess is that you're seeing the impacts of object promotion and >> flushing. You can sample the operations the OSDs are doing at any >> given time by running ops_in_progress (or similar, I forget exact >> phrasing) command on the OSD admin socket. I'm not sure if "rados df" >> is going to report cache movement activity or not. >> >> That though would mostly be written to the SSDs, not the hard drives — >> although the hard drives could still get metadata updates written when >> objects are flushed. What data exactly are you seeing that's leading >> you to believe writes are happening against these drives? What is the >> exact CephFS and cache pool configuration? >> -Greg >> >> On Mon, Mar 16, 2015 at 2:36 PM, Erik Logtenberg <erik@xxxxxxxxxxxxx> wrote: >>> Hi, >>> >>> I forgot to mention: while I am seeing these writes in iotop and >>> /proc/diskstats for the hdd's, I am -not- seeing any writes in "rados >>> df" for the pool residing on these disks. There is only one pool active >>> on the hdd's and according to rados df it is getting zero writes when >>> I'm just reading big files from cephfs. >>> >>> So apparently the osd's are doing some non-trivial amount of writing on >>> their own behalf. What could it be? >>> >>> Thanks, >>> >>> Erik. >>> >>> >>> On 03/16/2015 10:26 PM, Erik Logtenberg wrote: >>>> Hi, >>>> >>>> I am getting relatively bad performance from cephfs. I use a replicated >>>> cache pool on ssd in front of an erasure coded pool on rotating media. >>>> >>>> When reading big files (streaming video), I see a lot of disk i/o, >>>> especially writes. I have no clue what could cause these writes. The >>>> writes are going to the hdd's and they stop when I stop reading. >>>> >>>> I mounted everything with noatime and nodiratime so it shouldn't be >>>> that. On a related note, the Cephfs metadata is stored on ssd too, so >>>> metadata-related changes shouldn't hit the hdd's anyway I think. >>>> >>>> Any thoughts? How can I get more information about what ceph is doing? >>>> Using iotop I only see that the osd processes are busy but it doesn't >>>> give many hints as to what they are doing. >>>> >>>> Thanks, >>>> >>>> Erik. >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com