i hope this old mailist archive can help u.. http://kerneltrap.org/mailarchive/linux-kernel/2007/1/11/44365/thread O_DIRECT seems always like a evil. 2009/5/15 David Wuertele <dave+gmane@xxxxxxxxxxxx>: > I'm developing an embedded mipsel system with linux-mips.org's > linux-2.6.18, and I'm finding that reads of disk files opened with > O_DIRECT end up corrupted. I've googled O_DIRECT corruption and the > only advice I've come up with is make sure the reads are aligned, and > I've done that, to no avail. I wonder if anyone has some clue for > sale. > > Here are some of the details: > > 0. The corruption happens the same way regardless of what filesystem > I use. > > 1. The corruption consists of 64 contiguous unset bytes in specific > places in the buffer. I.e., if the buffer was zeroed before the > read, there will be some regions of 64 zeros in place of the > expected file data. If the buffer was memset() to the character > 'z', there will be some regions of 64 'z' characters in place of > the expected file data. > > 2. Reading the same file without O_DIRECT results in expected file > data, no corruption. > > 3. Performing the same O_DIRECT read multiple times into the same > buffer results in the same corruption > > 4. Performing the same O_DIRECT read multiple times into different > buffers results in different patterns of corruption > > 5. The corruptions occur on 64-byte aligned offsets, but usually not > on page (4096 byte)-aligned offsets. > > 6. The corruptions only occur within 48 pages (196608 bytes) of the > end of the buffer, regardless of buffer size, read size, or buffer > alignment > > 7. Two O_DIRECT reads of different offsets into the file into the > same buffer result in identical patterns of corruption! > > 8. Create a big buffer. Do an O_DIRECT read into offset 0 of that > buffer, then do an O_DIRECT read of the same file into offset X of > that buffer. The pattern of corruption will be found at the same > offset into the buffer, which means that the pattern of corruption > will be shifted by an offset X between the reads. > > Here is a graphical representation of a series of small reads into > different offsets of a single larger buffer. The vertical bars ("|") > represent a 64 page buffer at a 2MB alignment allocated as follows: > > char *; > posix_memalign (buf, 2097152, 262144); > > I open the same file twice, once with O_DIRECT and once without. I do > reads with the O_DIRECT filehandle using different offsets into buf, > and compare each time with identical reads (into a seperate, identical > buffer) using the non-O_DIRECT filehandle. Before each read, I use > memset() to fill the buffer with a specific value, the "unset data > character". Somtimes I use zero, sometimes not. > > Each line of the following graph represents a read at a specific > offset into buf. The first line is read into buf with a zero offset. > Each subsequent line is a read into buf with the offset increased by > one page. The read size happens to be 18 pages (73728 bytes), but > that size is not significant --- the same style of corruption occurs > regardless of the read size. > > A "." character represents a page which matches perfectly between the > O_DIRECT read and the non-O_DIRECT read. An "X" character represnts a > page of the O_DIRECT read containing one or more of the 64-byte > regions of the "unset data character". A " " (space) character > represents the part of the buffer which was unused for this read. > > |.................. | > | ...............X.X | > | ..............XXX. | > | .............XXX.. | > | ............XXX..X | > | ...........XXX..XX | > | ..........XXX..XXX | > | .........XXX..XXXX | > | ........XXX..XXXX. | > | .......XXX..XXXX.. | > | ......XXX..XXXX..X | > | .....XXX..XXXX..X. | > | ....XXX..XXXX..X.X | > | ...XXX..XXXX..X.X. | > | ..XXX..XXXX..X.X.X | > | .XXX..XXXX..X.X.XX | > | XXX..XXXX..X.X.XXX | > | XX..XXXX..X.X.XXX. | > | XX.XXXX..X.X.XXX.. | > | X.XXXX..X.X.XXX..X | > | .XXXX..X.X.XXX..X. | > | XXXX..X.X.XXX..X.. | > | XXX..X.X.XXX..X..X | > | XX..X.X.XXX..X..XX | > | X..X.X.XXX..X..XXX | > | ..X.X.XXX..X..XXXX | > | .X.X.XXX..X..XXXX. | > | X.X.XXX..X..XXXX.. | > | .X.XXX..X..XXXX..X | > | X.XXX..X..XXXX..XX | > | .XXX..X..XXXX..XX. | > | XXX..X..XXXX..XX.X | > | XX..X..XXXX..XX.XX | > | X..X..XXXX..XX.XXX | > | ..X..XXXX..XX.XXXX | > | .X..XXXX..XX.XXXXX | > | X..XXXX..XX.XXXXX. | > | ..XXXX..XX.XXXXX.X | > | .XXXX..XX.XXXXX.X. | > | XXXX..XX.XXXXX.X.X | > | XXX..XX.XXXXX.X.X. | > | XX..XX.XXXXX.X.X.X | > | X..XX.XXXXX.X.X.X. | > | ..XX.XXXXX.X.X.X.X | > | .XX.XXXXX.X.X.X.XX | > | XX.XXXXX.X.X.X.XXX | > | X.XXXXX.X.X.X.XXXX| > > Note that the corruption never happens until you get within 48 pages > of the end of the buffer. > > Any suggestions? > > Thanks, > Dave > > > > -- > To unsubscribe from this list: send an email with > "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx > Please read the FAQ at http://kernelnewbies.org/FAQ > > -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ