Sage Weil <sage@xxxxxxxxxxxx> writes: > On Fri, 10 Feb 2012, Jeff Moyer wrote: >> Sage Weil <sage@xxxxxxxxxxxx> writes: >> >> > I hit the following under a reasonable simple aio workload: >> > >> > - reasonably heavy load >> > - lots of threads doing buffered io to random files >> > - one thread submitting O_DIRECT aio to a single file (journal), all >> > sequential (wrapping), 100MB >> > - probably somewhere between 1 and 50 aios outstanding at any point in >> > time. >> > >> > The kernel was v3.2 mainline, plus unrelated btrfs and ceph patches. >> > >> > Is this a known issue? Any other information that would be helpful? >> >> I don't know for sure, but could you test with the following commit? >> 69e4747ee9727d660b88d7e1efe0f4afcb35db1b > > I'll pull this in and see if it comes up again (this is the first time > I've seen the crash). OK, thanks. >> Also, I'll note that it looks like you are doing O_SYNC + O_DIRECT AIO. >> I'm curious to know what apps use that particular combination. Is this >> just a test case, or do you have an app which does this in production? > > That's what ceph-osd is doing on it's journal. Rereading the man page > it's not clear to me what I *should* be doing, though. Would you use > O_SYNC (with O_DIRECT) only to make sure the blocks you write to are > allocated/reachable on crash? (Or, say, mtime is updated?) O_DIRECT just bypasses the page cache--it doesn't provide any guarantees that the data is on stable storage (so that's why you'd want to also use O_SYNC). Given that you're continually overwriting a log, I don't think you have to really worry about metadata, right? So, for your case, either you can use O_SYNC as you are doing today, or you could fsync whenever you wanted to ensure the disk cache was flushed. I didn't mean to imply that Ceph was doing anything wrong. That is a perfectly valid combination of flags/operations. Cheers, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html