On 5/22/14 11:51 , Gy?rv?ri G?bor wrote: > Hello, > > Got this kind of logs in two node of 3 node cluster both node has 2 > OSD, only affected 2 OSD on two separate node thats why i dont > understand the situation. There wasnt any extra io on the system at > the given time. > > Using radosgw with s3 api to store objects under ceph average ops > around 20-150 and bw usage 100-2000kb read / sec and only 50-1000kb / > sec written. > > osd_op(client.7821.0:67251068 > default.4181.1_products/800x600/537e28022fdcc.jpg [cmpxattr > user.rgw.idtag (22) op 1 mode 1,setxattr user.rgw.idtag (33),call > refcount.put] 11.fe53a6fb e590) v4 *currently waiting for subops from > [2] ** > * Are any of your PGs in recovery or backfill? I've seen this happen two different ways. The first time was because I had the recovery and backfill parameters set too high for my cluster. If your journals aren't SSDs, the default parameters are too high. The recovery operation will use most of the IOps, and starve the clients. The second time I saw this is when one disk was starting to fail. Sectors starting failing, and the drive spent a lot of time reading and remapping bad sectors. Consumer class SATA disks will retry bad sectors for 30+ second. It happens in the drive firmware, so it's not something you can stop. Enterprise class drives will give up quicker, since they know you have another copy of the data. (Nobody uses enterprise class drives stand-alone; they're always in some sort of storage array). I've had reports of 6+ OSDs blocking subops, and I traced it back to one disk that was blocking others. I replaced that disk, and the warnings went away. If your cluster is healthy, check the SMART attributes for osd.2. If osd.2 looks good, it might another osd. Check osd.2 logs, and check any osd that are blocking osd.2. If your cluster is small, it might be faster to just check all disks instead of following the trail. -- *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email clewis at centraldesktop.com <mailto:clewis at centraldesktop.com> *Central Desktop. Work together in ways you never thought possible.* Connect with us Website <http://www.centraldesktop.com/> | Twitter <http://www.twitter.com/centraldesktop> | Facebook <http://www.facebook.com/CentralDesktop> | LinkedIn <http://www.linkedin.com/groups?gid=147417> | Blog <http://cdblog.centraldesktop.com/> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140523/91150a3b/attachment.htm>