O_DIRECT bug, looking for advice

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm developing an embedded mipsel system with linux-mips.org's
linux-2.6.18, and I'm finding that reads of disk files opened with
O_DIRECT end up corrupted.  I've googled O_DIRECT corruption and the
only advice I've come up with is make sure the reads are aligned, and
I've done that, to no avail.  I wonder if anyone has some clue for
sale.

Here are some of the details:

0.  The corruption happens the same way regardless of what filesystem
    I use.

1.  The corruption consists of 64 contiguous unset bytes in specific
    places in the buffer.  I.e., if the buffer was zeroed before the
    read, there will be some regions of 64 zeros in place of the
    expected file data.  If the buffer was memset() to the character
    'z', there will be some regions of 64 'z' characters in place of
    the expected file data.

2.  Reading the same file without O_DIRECT results in expected file
    data, no corruption.

3.  Performing the same O_DIRECT read multiple times into the same
    buffer results in the same corruption

4.  Performing the same O_DIRECT read multiple times into different
    buffers results in different patterns of corruption

5.  The corruptions occur on 64-byte aligned offsets, but usually not
    on page (4096 byte)-aligned offsets.

6.  The corruptions only occur within 48 pages (196608 bytes) of the
    end of the buffer, regardless of buffer size, read size, or buffer
    alignment

7.  Two O_DIRECT reads of different offsets into the file into the
    same buffer result in identical patterns of corruption!

8.  Create a big buffer.  Do an O_DIRECT read into offset 0 of that
    buffer, then do an O_DIRECT read of the same file into offset X of
    that buffer.  The pattern of corruption will be found at the same
    offset into the buffer, which means that the pattern of corruption
    will be shifted by an offset X between the reads.

Here is a graphical representation of a series of small reads into
different offsets of a single larger buffer.  The vertical bars ("|")
represent a 64 page buffer at a 2MB alignment allocated as follows:

    char *;
    posix_memalign (buf, 2097152, 262144);

I open the same file twice, once with O_DIRECT and once without.  I do
reads with the O_DIRECT filehandle using different offsets into buf,
and compare each time with identical reads (into a seperate, identical
buffer) using the non-O_DIRECT filehandle.  Before each read, I use
memset() to fill the buffer with a specific value, the "unset data
character".  Somtimes I use zero, sometimes not.

Each line of the following graph represents a read at a specific
offset into buf.  The first line is read into buf with a zero offset.
Each subsequent line is a read into buf with the offset increased by
one page.  The read size happens to be 18 pages (73728 bytes), but
that size is not significant --- the same style of corruption occurs
regardless of the read size.

A "." character represents a page which matches perfectly between the
O_DIRECT read and the non-O_DIRECT read.  An "X" character represnts a
page of the O_DIRECT read containing one or more of the 64-byte
regions of the "unset data character".  A " " (space) character
represents the part of the buffer which was unused for this read.

|..................                                              |
| ...............X.X                                             |
|  ..............XXX.                                            |
|   .............XXX..                                           |
|    ............XXX..X                                          |
|     ...........XXX..XX                                         |
|      ..........XXX..XXX                                        |
|       .........XXX..XXXX                                       |
|        ........XXX..XXXX.                                      |
|         .......XXX..XXXX..                                     |
|          ......XXX..XXXX..X                                    |
|           .....XXX..XXXX..X.                                   |
|            ....XXX..XXXX..X.X                                  |
|             ...XXX..XXXX..X.X.                                 |
|              ..XXX..XXXX..X.X.X                                |
|               .XXX..XXXX..X.X.XX                               |
|                XXX..XXXX..X.X.XXX                              |
|                 XX..XXXX..X.X.XXX.                             |
|                  XX.XXXX..X.X.XXX..                            |
|                   X.XXXX..X.X.XXX..X                           |
|                    .XXXX..X.X.XXX..X.                          |
|                     XXXX..X.X.XXX..X..                         |
|                      XXX..X.X.XXX..X..X                        |
|                       XX..X.X.XXX..X..XX                       |
|                        X..X.X.XXX..X..XXX                      |
|                         ..X.X.XXX..X..XXXX                     |
|                          .X.X.XXX..X..XXXX.                    |
|                           X.X.XXX..X..XXXX..                   |
|                            .X.XXX..X..XXXX..X                  |
|                             X.XXX..X..XXXX..XX                 |
|                              .XXX..X..XXXX..XX.                |
|                               XXX..X..XXXX..XX.X               |
|                                XX..X..XXXX..XX.XX              |
|                                 X..X..XXXX..XX.XXX             |
|                                  ..X..XXXX..XX.XXXX            |
|                                   .X..XXXX..XX.XXXXX           |
|                                    X..XXXX..XX.XXXXX.          |
|                                     ..XXXX..XX.XXXXX.X         |
|                                      .XXXX..XX.XXXXX.X.        |
|                                       XXXX..XX.XXXXX.X.X       |
|                                        XXX..XX.XXXXX.X.X.      |
|                                         XX..XX.XXXXX.X.X.X     |
|                                          X..XX.XXXXX.X.X.X.    |
|                                           ..XX.XXXXX.X.X.X.X   |
|                                            .XX.XXXXX.X.X.X.XX  |
|                                             XX.XXXXX.X.X.X.XXX |
|                                              X.XXXXX.X.X.X.XXXX|

Note that the corruption never happens until you get within 48 pages
of the end of the buffer.

Any suggestions?

Thanks,
Dave



--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ


[Index of Archives]     [Newbies FAQ]     [Linux Kernel Mentors]     [Linux Kernel Development]     [IETF Annouce]     [Git]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux SCSI]     [Linux ACPI]
  Powered by Linux