Hi Sage, Thanks for your mail.When turn on filestore sync flush, it seems works and OSD process doesn't suicide any more . I have already disabled flusher long age since both Mark's and my report show disable flusher seems to improve performance(so my original configuration is filestore_flusher=false, filestore_sync_flush=false(default)), but now we have to reconsider on this. I would like to see the internal code of ::sync_file_range() to learn more about how it works. First guess is ::sync_file_range will push request to disk queue and if the disk queue is full, this call will block and wait, but not sure. But from the code path,(BTW, these lines of codes are a bit hard to follow) if (!should_flush || !m_filestore_flusher || !queue_flusher(fd, offset, len)) { if (should_flush && m_filestore_sync_flush) ::sync_file_range(fd, offset, len, SYNC_FILE_RANGE_WRITE); lfn_close(fd); } With the default setting (m_filestore_flusher = true) , the flusher queue will soon burn out, in this situation, if user doesn't turn on " m_filestore_sync_flush = ture ", he/she will likely to hit the same situation that writes remain in page cache and OSD daemon died when trying to sync. I suppose the right logical should be(persuade code), : if (should_flush) { If(m_filestore_flusher) If(queue_flusher(fd, offset, len) Do nothing Else ::sync_file_range(fd, offset, len, SYNC_FILE_RANGE_WRITE); Else if (m_filestore_sync_flush ) ::sync_file_range(fd, offset, len, SYNC_FILE_RANGE_WRITE); lfn_close(fd); } Xiaoxi -----Original Message----- From: Sage Weil [mailto:sage@xxxxxxxxxxx] Sent: 2013年3月25日 23:35 To: Chen, Xiaoxi Cc: 'ceph-users@xxxxxxxxxxxxxx' (ceph-users@xxxxxxxxxxxxxx); ceph-devel@xxxxxxxxxxxxxxx Subject: Re: [ceph-users] Ceph Crach at sync_thread_timeout after heavy random writes. Hi Xiaoxi, On Mon, 25 Mar 2013, Chen, Xiaoxi wrote: > From Ceph-w , ceph reports a very high Ops (10000+ /s) , but > technically , 80 spindles can provide up to 150*80/2=6000 IOPS for 4K > random write. > > When digging into the code, I found that the OSD write data > to Pagecache than returned, although it called ::sync_file_range, but > this syscall doesn?t actually sync data to disk when it return,it?s an aync call. > So the situation is , the random write will be extremely fast since it > only write to journal and pagecache, but once syncing , it will take > very long time. The speed gap between journal and OSDs exist, the > amount of data that need to be sync keep increasing, and it will certainly exceed 600s. The sync_file_range is only there to push things to disk sooner, so that the eventual syncfs(2) takes less time. When the async flushing is enabled, there is a limit to the number of flushes that are in the queue, but if it hits the max it just does dout(10) << "queue_flusher ep " << sync_epoch << " fd " << fd << " " << off << "~" << len << " qlen " << flusher_queue_len << " hit flusher_max_fds " << m_filestore_flusher_max_fds << ", skipping async flush" << dendl; Can you confirm that the filestore is taking this path? (debug filestore = 10 and then reproduce.) You may want to try filestore flusher = false filestore sync flush = true and see if that changes things--it will make the sync_file_range() happen inline after the write. Anyway, it sounds like you may be queueing up so many random writes that the sync takes forever. I've never actually seen that happen, so if we can confirm that's what is going on that will be very interesting. Thanks- sage > > > > For more information, I have tried to reproduce this by rados > bench,but failed. > > > > Could you please let me know if you need any more > informations & have some solutions? Thanks > > ?? ? ?? ? ?? ? Xiaoxi > > > ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f