Re: RFC: Data pattern buffer filling race condition fix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Nov 7, 2010 at 12:43 PM, Jens Axboe <jaxboe@xxxxxxxxxxxx> wrote:
>
> On 2010-11-06 10:35, Bart Van Assche wrote:
> > On multicore non-x86 CPUs fio has been observed to frequently reports false
> > data verification failures with I/O engine libaio and I/O depths above one.
> > This is because of a race condition in the function fill_pattern(). The code
> > in that function only works correct if all CPUs of a multicore system
> > observe store instructions in the order they were issued. That is the case for
> > multicore x86 systems but not for all other CPU families, such as e.g. the
> > POWER CPU family.
> >
> > As far as I can see this bug was introduced via commit
> > cbe8d7561cf6d81d741d87eb7940db2a111d2144 (July 14, 2010).
> >
> > I'm posting this patch as an RFC since the fix is GCC-specific.
>
> ppc is notorious for its weaker memory ordering. I do have a ppc test
> box, but haven't used it in a while. But it used to find bugs
> immediately for race conditions, that x86 would never trigger. So since
> you are pin pointing that particular commit, you are convinced that this
> bug manifests itself due to bad ordering between the filled buffer and
> the fill length?

I haven't done a full bisect, but fio version 1.41.6 (Jul 9, 2010) + a
backport of the PowerPC version of get_cpu_clock() (commit
5f39d8f797fcf01bd94b89ef7ed2bdb76deb2601 from August 10, 2010) was working
fine. And it is easy to see how reordered writes will cause commit
cbe8d7561cf6d81d741d87eb7940db2a111d2144 to make fio behave incorrectly.

> > diff --git a/verify.c b/verify.c
> > index ea1a911..3826198 100644
> > --- a/verify.c
> > +++ b/verify.c
> > @@ -31,18 +31,27 @@ void fill_pattern(struct thread_data *td, void *p, unsigned int len, struct io_u
> >               fill_random_buf(p, len);
> >               break;
> >       case 1:
> > +#ifdef __GNUC__
> > +             __sync_synchronize();
> > +#endif
>
> You should be able to use the fio included write_barrier() and
> read_barrier(), which are hooked to the architectures. Then you don't
> need to use GNUC additions.

This patch also works for me on PowerPC:

Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>

diff --git a/verify.c b/verify.c
index ea1a911..c5b2fbe 100644
--- a/verify.c
+++ b/verify.c
@@ -31,18 +31,21 @@ void fill_pattern(struct thread_data *td, void *p,
unsigned int len, struct io_u
 		fill_random_buf(p, len);
 		break;
 	case 1:
+		write_barrier();
 		if (io_u->buf_filled_len >= len) {
 			dprint(FD_VERIFY, "using already filled verify pattern b=0 len=%u\n", len);
 			return;
 		}
 		dprint(FD_VERIFY, "fill verify pattern b=0 len=%u\n", len);
 		memset(p, td->o.verify_pattern[0], len);
+		write_barrier();
 		io_u->buf_filled_len = len;
 		break;
 	default: {
 		unsigned int i = 0, size = 0;
 		unsigned char *b = p;

+		write_barrier();
 		if (io_u->buf_filled_len >= len) {
 			dprint(FD_VERIFY, "using already filled verify pattern b=%d len=%u\n",
 					td->o.verify_pattern_bytes, len);
@@ -58,6 +61,7 @@ void fill_pattern(struct thread_data *td, void *p,
unsigned int len, struct io_u
 			memcpy(b+i, td->o.verify_pattern, size);
 			i += size;
 		}
+		write_barrier();
 		io_u->buf_filled_len = len;
 		break;
 		}
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux