librados aysnc I/O takes considerably longer to complete

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Is anyone using librados AIO APIs? I seem to have a problem with that where
the rados_aio_wait_for_complete() call just waits for a long period of time
before it finishes without error.

More info on my setup:
I am using Ceph 14.2.4 and write 8MB objects.

I run my AIO program on 24 nodes at the same time each writing a different
data (splits into 8MB objects and  ), each data is about 2G.

Normally, it takes about 10 mins for all of them to complete. But often one
or more nodes takes considerably longer to finish. When looking at the one
of those, I mostly see that the IO requests have been submitted and waits
at:

#0  pthread_cond_wait@@GLIBC_2.3.2 () at
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00002aaaaad0c8fa in rados_aio_wait_for_complete () from
/cgv/geovation/2/test/ceph/lib/librados.so.2

Then it eventually completes with no errors from
 rados_aio_wait_for_complete() call.

The (pseudo) code looks like:

        while (data remains to be written) {
          size_t aio_ops_count = 0;
         rados_completion_t aio_comp[12];

            for (size_t j = 0; j < 12; ++j) {
                int err = rados_aio_create_completion(NULL, NULL, NULL,
&aio_comp[j]);
                if (err < 0) {
                    cerr << "rados_aio_create_completion: " <<
strerror(-err) << endl;
                    return 1;
                }

                string obj_ = getobjectid();

                err = rados_aio_write_full(io, obj_.c_str(), aio_comp[j],
read_buf[j], bytes);
                if (err < 0) {
                    cerr << "rados_write_full: " << strerror(-err) << endl;
                    return 1;
                }

                ++aio_ops_count;
            }

            for (size_t j = 0; j < aio_ops_count; ++j) {
                rados_aio_wait_for_complete(aio_comp[j]);
                int err = rados_aio_get_return_value(aio_comp[j]); //
Considerably longer delay here ??
                if (err < 0) {
                    cerr << "rados_aio_get_return_value: " <<
strerror(-err) << endl;
                    return 1;
                }

                rados_aio_release(aio_comp[j]);
            }

}

I ran under Valgrind and see no issues and also read the data back and
checksum it to verify no corruption issues. So everything appears to "work"
as expected except for longer delays at times.
Wondering if anyone is using the AIO APIs to write objects and had
experienced any similar problems.
Please let me know if you need further information.

(Originally posted this to dev@xxxxxxx and on Daniel's suggestion, I am
posting here).

Regards,
Ponnuvel P


-- 
Regards,
Ponnuvel P
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux