Hello, No i dont see any backfill log in ceph.log during that period, drives are WD2000FYYZ-01UL1B1 but i did not find any informations in SMART, and yes i will check other drives too. Could i determine somehow, in which PG placed the file? Thanks 2014.05.23. 20:51 keltez?ssel, Craig Lewis ?rta: > On 5/22/14 11:51 , Gy?rv?ri G?bor wrote: >> Hello, >> >> Got this kind of logs in two node of 3 node cluster both node has 2 >> OSD, only affected 2 OSD on two separate node thats why i dont >> understand the situation. There wasnt any extra io on the system at >> the given time. >> >> Using radosgw with s3 api to store objects under ceph average ops >> around 20-150 and bw usage 100-2000kb read / sec and only 50-1000kb / >> sec written. >> >> osd_op(client.7821.0:67251068 >> default.4181.1_products/800x600/537e28022fdcc.jpg [cmpxattr >> user.rgw.idtag (22) op 1 mode 1,setxattr user.rgw.idtag (33),call >> refcount.put] 11.fe53a6fb e590) v4 *currently waiting for subops from >> [2] ** >> * > > Are any of your PGs in recovery or backfill? > > I've seen this happen two different ways. The first time was because > I had the recovery and backfill parameters set too high for my > cluster. If your journals aren't SSDs, the default parameters are too > high. The recovery operation will use most of the IOps, and starve > the clients. > > The second time I saw this is when one disk was starting to fail. > Sectors starting failing, and the drive spent a lot of time reading > and remapping bad sectors. Consumer class SATA disks will retry bad > sectors for 30+ second. It happens in the drive firmware, so it's not > something you can stop. Enterprise class drives will give up quicker, > since they know you have another copy of the data. (Nobody uses > enterprise class drives stand-alone; they're always in some sort of > storage array). > > I've had reports of 6+ OSDs blocking subops, and I traced it back to > one disk that was blocking others. I replaced that disk, and the > warnings went away. > > > If your cluster is healthy, check the SMART attributes for osd.2. If > osd.2 looks good, it might another osd. Check osd.2 logs, and check > any osd that are blocking osd.2. If your cluster is small, it might > be faster to just check all disks instead of following the trail. > > > > -- > > *Craig Lewis* > Senior Systems Engineer > Office +1.714.602.1309 > Email clewis at centraldesktop.com <mailto:clewis at centraldesktop.com> > > *Central Desktop. Work together in ways you never thought possible.* > Connect with us Website <http://www.centraldesktop.com/> | Twitter > <http://www.twitter.com/centraldesktop> | Facebook > <http://www.facebook.com/CentralDesktop> | LinkedIn > <http://www.linkedin.com/groups?gid=147417> | Blog > <http://cdblog.centraldesktop.com/> > -- Gy?rv?ri G?bor - Scr34m scr34m at frontember.hu -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140524/449fe912/attachment.htm>