Hi Sage, Pull request is https://github.com/ceph/ceph/pull/3305. Thanks! Jianpeng Ma > -----Original Message----- > From: Sage Weil [mailto:sweil@xxxxxxxxxx] > Sent: Wednesday, January 7, 2015 10:18 AM > To: Ma, Jianpeng > Cc: ceph@xxxxxxxxx; Vijayendra.Shamanna@xxxxxxxxxxx; > ceph-devel@xxxxxxxxxxxxxxx > Subject: RE: Ceph data consistency > > On Wed, 7 Jan 2015, Ma, Jianpeng wrote: > > > ---------- Forwarded message ---------- > > > From: Pawe? Sadowski <ceph@xxxxxxxxx> > > > Date: 2014-12-30 21:40 GMT+08:00 > > > Subject: Re: Ceph data consistency > > > To: Vijayendra Shamanna <Vijayendra.Shamanna@xxxxxxxxxxx>, > > > "ceph-devel@xxxxxxxxxxxxxxx" <ceph-devel@xxxxxxxxxxxxxxx> > > > > > > > > > On 12/30/2014 01:40 PM, Vijayendra Shamanna wrote: > > > > Hi, > > > > > > > > There is a sync thread (sync_entry in FileStore.cc) which triggers > > > > periodically and executes sync_filesystem() to ensure that the > > > > data is consistent. The journal entries are trimmed only after a > > > > successful > > > > sync_filesystem() call > > > > > > sync_filesystem() always returns zero and journal will be trimmed. > > > Executing sync()/syncfs() with dirty data in disk buffers will > > > result in data loss ("lost page write due to I/O error"). > > > > > Hi sage: > > > > From the git log, I see at first sync_filesystem() return the result of syncfs(). > > But in this commit 808c644248e486f44: > > Improve use of syncfs. > > Test syncfs return value and fallback to btrfs sync and then sync. > > The author hope if syncfs() met error and sync() can resolve. Because > > sync() don't return result So it only return zero. > > But which error can handle by this way? AFAK, no. > > I suggest it directly return result of syncfs(). > > Yeah, that sounds right! > > sage > > > > > > Jianpeng Ma > > Thanks! > > > > > > > I was doing some experiments simulating disk errors using Device > > > Mapper "error" target. In this setup OSD was writing to broken disk > without crashing. > > > Every 5 seconds (filestore_max_sync_interval) kernel logs that some > > > data were discarded due to IO error. > > > > > > > > > > Thanks > > > > Viju > > > >> -----Original Message----- > > > >> From: ceph-devel-owner@xxxxxxxxxxxxxxx > > > >> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Pawel > > > >> Sadowski > > > >> Sent: Tuesday, December 30, 2014 1:52 PM > > > >> To: ceph-devel@xxxxxxxxxxxxxxx > > > >> Subject: Ceph data consistency > > > >> > > > >> Hi, > > > >> > > > >> On our Ceph cluster from time to time we have some inconsistent > > > >> PGs (after > > > deep-scrub). We have some issues with disk/sata cables/lsi > > > controller causing IO errors from time to time (but that's not the point in > this case). > > > >> > > > >> When IO error occurs on OSD journal partition everything works as > > > >> is should > > > -> OSD is crashed and that's ok - Ceph will handle that. > > > >> > > > >> But when IO error occurs on OSD data partition during journal > > > >> flush OSD > > > continue to work. After calling *writev* (in buffer::list::write_fd) > > > OSD does check return code from this call but does NOT verify if > > > write has been successful to disk (data are still only >in memory > > > and there is no fsync). That way OSD thinks that data has been > > > stored on disk but it might be discarded (during sync dirty page > > > will be reclaimed and you'll see "lost page write due to I/O error" in dmesg). > > > >> > > > >> Since there is no checksumming of data I just wanted to make sure > > > >> that this > > > is by design. Maybe there is a way to tell OSD to call fsync after > > > write and have data consistent? > > > > > > -- > > > PS > > > -- > > > To unsubscribe from this list: send the line "unsubscribe > > > ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > N?????r??y??????X???v???)?{.n?????z?]z????ay?????j ??f???h??????w? > ?? > > ???j:+v???w???????? ????zZ+???????j"????i -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html