> -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > Somnath Roy > Sent: 04 June 2015 22:41 > To: Nick Fisk; 'Gregory Farnum' > Cc: ceph-users@xxxxxxxxxxxxxx > Subject: Re: Old vs New pool on same OSDs - Performance > Difference > > Nick, > I noticed that dumping page cache sometime helps as I was hitting Ubuntu > page cache compaction issue (I shared that to community sometimes back). > Perf top should show compaction related stack trace then . Setting sysctl vm > option min_free_kbytes to big numbers (like 5/10 GB in my 64 GB RAM > setup) may help. But, if it is the same issue over some period of time you will > hit again if you don't set the above option properly. Thanks for this, I will look into finding a suitable number and applying it. > > Regarding your second problem: > If you enable optracker, there are bunch of counters you can dump with > admin socket. But, if you are saying if it is served from page cache > performance is improved, it is unlikely it will be within OSD though. But, > again, same disk serving other RBDS are giving you good numbers (May be > part of the disk causing problem ?) ! > BTW, are you seeing something wrong in the log by enabling OSD and > filestore debug level to say 20 ? > If you can identify what PGs are slowing things down (by log or counters), > you can run similar fio reads directly on the drives responsible for holding > primary OSD for that PG. > I can't seem to find much info regarding the optracker. Do I just enable it by injecting " debug_optracker"? And once its enabled where do I find the counters? I turned up the debugging and checked a handful of OSD logs, but couldn't see anything obvious which would indicate why it was running slow. I have also today restarted the OSD's to wipe the stats and then run the fio benchmark again against an old RBD. The op_r_latency from the OSD perf dump matches up with what I am seeing from fio (40-60ms), so something is definitely not right. If I then run a fio benchmark against one of the RBD's which I have recently written to, the average returns to what I would expect. Actual disk latencies via iostat are in the normal range of what I would expect for a 7.2k disk. There's something funny going on, which seems to relate to reading objects that haven't been written to in a while, either in the OSD or the XFS file system. Interestingly I have 1 OSD which is using EXT4 and the op_r_latency latency is about half compared to the XFS ones after resetting the stats. This could just be a single anomaly, but I wonder if this whole problem is related to XFS? > Thanks & Regards > Somnath > > -----Original Message----- > From: Nick Fisk [mailto:nick@xxxxxxxxxx] > Sent: Thursday, June 04, 2015 2:12 PM > To: 'Gregory Farnum'; Somnath Roy > Cc: ceph-users@xxxxxxxxxxxxxx > Subject: RE: Old vs New pool on same OSDs - Performance > Difference > > > -----Original Message----- > > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf > > Of Gregory Farnum > > Sent: 04 June 2015 21:22 > > To: Nick Fisk > > Cc: ceph-users@xxxxxxxxxxxxxx > > Subject: Re: Old vs New pool on same OSDs - Performance > > Difference > > > > On Thu, Jun 4, 2015 at 6:31 AM, Nick Fisk <nick@xxxxxxxxxx> wrote: > > > > > > Hi All, > > > > > > I have 2 pools both on the same set of OSD’s, 1st is the default rbd > > > pool > > created at installation 3 months ago, the other has just recently been > > created, to verify performance problems. > > > > > > As mentioned both pools are on the same set of OSD’s, same crush > > > ruleset > > and RBD’s on both are identical in size, version and order. The only > > real difference that I can think of is that the existing pool as > > around 5 million objects on it. > > > > > > Testing using RBD enabled fio, I see the newly created pool get an > > expected random read IO performance of around 60 iop’s. The existing > > pool only gets around half of this. New pool latency = ~15ms Old pool > > latency = ~35ms for random reads. > > > > > > There is no other IO going on in the cluster at the point of running > > > these > > tests. > > > > > > XFS fragmentation is low, somewhere around 1-2% on most of the disks. > > Only difference I can think of is that the existing pool has data on > > it where the new one is empty apart from testing RBD, should this make a > difference? > > > > > > Any ideas? > > > > > > Any hints on what I can check to see why latency is so high for the > > > existing > > pool? > > > > > > Nick > > > > Apart from what Somnath said, depending on your PG counts and > > configuration setup you might also have put enough objects into the > > cluster that you have a multi-level PG folder hierarchy in the old > > pool. I wouldn't expect that to make a difference because those > > folders should be cached in RAM, but if somehow they're not that would > require more disk accesses. > > > > But more likely it's as Somnath suggests and since most of the objects > > don't exist for images in the new pool it's able to put back ENOENT on > > accesses much more quickly. > > -Greg > > Thanks for the replies guys. > > I had previously completely written to both test RBD's until full. Strangely, I > have just written to them both again and then dropped caches on all OSD > nodes. Now both seem to perform the same but at the speed of the faster > pool. > > I have then pointed fio at another existing RBD on the old pool and the > results are awful, averaging under 10 iops for 64k random read QD=1. > Unfortunately this RBD has live data on it, so can't overwrite it. > > But something seems up with RBD's (or the underlying objects) that have > had data written to them a while back. If I make sure the data is in the > pagecache, then I get really great performance, so it must be something to > do with reading data off the disk, but I'm lost as to what it might be. > > Iostat doesn't really show anything interesting, but I'm guessing a single > thread read over 40 disks wouldn't anyway. Are there any counters I could > look at that might help to break down the steps the OSD goes through to do > the read to determine where the slow down comes from? > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > ________________________________ > > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If the > reader of this message is not the intended recipient, you are hereby notified > that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly prohibited. If > you have received this communication in error, please notify the sender by > telephone or e-mail (as shown above) immediately and destroy any and all > copies of this message in your possession (whether hard copies or > electronically stored copies). > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com