----- Original Message ----- > From: "Daniel Schneller" <daniel.schneller@xxxxxxxxxxxxxxxx> > To: ceph-users@xxxxxxxxxxxxxx > Sent: Tuesday, March 3, 2015 2:54:13 AM > Subject: Understand RadosGW logs > > Hi! > > After realizing the problem with log rotation (see > http://thread.gmane.org/gmane.comp.file-systems.ceph.user/17708) > and fixing it, I now for the first time have some > meaningful (and recent) logs to look at. > > While from an application perspective there seem > to be no issues, I would like to understand some > messages I find with relatively high frequency in > the logs: > > Exhibit 1 > ------------- > 2015-03-03 11:14:53.685361 7fcf4bfef700 0 ERROR: flush_read_list(): > d->client_c->handle_data() returned -1 > 2015-03-03 11:15:57.476059 7fcf39ff3700 0 ERROR: flush_read_list(): > d->client_c->handle_data() returned -1 > 2015-03-03 11:17:43.570986 7fcf25fcb700 0 ERROR: flush_read_list(): > d->client_c->handle_data() returned -1 > 2015-03-03 11:22:00.881640 7fcf39ff3700 0 ERROR: flush_read_list(): > d->client_c->handle_data() returned -1 > 2015-03-03 11:22:48.147011 7fcf35feb700 0 ERROR: flush_read_list(): > d->client_c->handle_data() returned -1 > 2015-03-03 11:27:40.572723 7fcf50ff9700 0 ERROR: flush_read_list(): > d->client_c->handle_data() returned -1 > 2015-03-03 11:29:40.082954 7fcf36fed700 0 ERROR: flush_read_list(): > d->client_c->handle_data() returned -1 > 2015-03-03 11:30:32.204492 7fcf4dff3700 0 ERROR: flush_read_list(): > d->client_c->handle_data() returned -1 It means that returning data to the client got some error, usually means that the client disconnected before completion. > > I cannot find anything relevant by Googling for > that, apart from the actual line of code that > produces this line. > What does that mean? Is it an indication of data > corruption or are there more benign reasons for > this line? > > > Exhibit 2 > -------------- > Several of these blocks > > 2015-03-03 07:06:17.805772 7fcf36fed700 1 ====== starting new request > req=0x7fcf5800f3b0 ===== > 2015-03-03 07:06:17.836671 7fcf36fed700 0 > RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592 > part_ofs=0 rule->part_size=0 > 2015-03-03 07:06:17.836758 7fcf36fed700 0 > RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896 > part_ofs=0 rule->part_size=0 > 2015-03-03 07:06:17.836918 7fcf36fed700 0 > RGWObjManifest::operator++(): result: ofs=13055243 stripe_ofs=13055243 > part_ofs=0 rule->part_size=0 > 2015-03-03 07:06:18.263126 7fcf36fed700 1 ====== req done > req=0x7fcf5800f3b0 http_status=200 ====== > ... > 2015-03-03 09:27:29.855001 7fcf28fd1700 1 ====== starting new request > req=0x7fcf580102a0 ===== > 2015-03-03 09:27:29.866718 7fcf28fd1700 0 > RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592 > part_ofs=0 rule->part_size=0 > 2015-03-03 09:27:29.866778 7fcf28fd1700 0 > RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896 > part_ofs=0 rule->part_size=0 > 2015-03-03 09:27:29.866852 7fcf28fd1700 0 > RGWObjManifest::operator++(): result: ofs=13107200 stripe_ofs=13107200 > part_ofs=0 rule->part_size=0 > 2015-03-03 09:27:29.866917 7fcf28fd1700 0 > RGWObjManifest::operator++(): result: ofs=17301504 stripe_ofs=17301504 > part_ofs=0 rule->part_size=0 > 2015-03-03 09:27:29.875466 7fcf28fd1700 0 > RGWObjManifest::operator++(): result: ofs=21495808 stripe_ofs=21495808 > part_ofs=0 rule->part_size=0 > 2015-03-03 09:27:29.884434 7fcf28fd1700 0 > RGWObjManifest::operator++(): result: ofs=25690112 stripe_ofs=25690112 > part_ofs=0 rule->part_size=0 > 2015-03-03 09:27:29.906155 7fcf28fd1700 0 > RGWObjManifest::operator++(): result: ofs=29884416 stripe_ofs=29884416 > part_ofs=0 rule->part_size=0 > 2015-03-03 09:27:29.914364 7fcf28fd1700 0 > RGWObjManifest::operator++(): result: ofs=34078720 stripe_ofs=34078720 > part_ofs=0 rule->part_size=0 > 2015-03-03 09:27:29.940653 7fcf28fd1700 0 > RGWObjManifest::operator++(): result: ofs=38273024 stripe_ofs=38273024 > part_ofs=0 rule->part_size=0 > 2015-03-03 09:27:30.272816 7fcf28fd1700 0 > RGWObjManifest::operator++(): result: ofs=42467328 stripe_ofs=42467328 > part_ofs=0 rule->part_size=0 > 2015-03-03 09:27:31.125773 7fcf28fd1700 0 > RGWObjManifest::operator++(): result: ofs=46661632 stripe_ofs=46661632 > part_ofs=0 rule->part_size=0 > 2015-03-03 09:27:31.192661 7fcf28fd1700 0 ERROR: flush_read_list(): > d->client_c->handle_data() returned -1 > 2015-03-03 09:27:31.194481 7fcf28fd1700 1 ====== req done > req=0x7fcf580102a0 http_status=200 ====== > ... > 2015-03-03 09:28:43.008517 7fcf2a7d4700 1 ====== starting new request > req=0x7fcf580102a0 ===== > 2015-03-03 09:28:43.016414 7fcf2a7d4700 0 > RGWObjManifest::operator++(): result: ofs=887579 stripe_ofs=887579 > part_ofs=0 rule->part_size=0 > 2015-03-03 09:28:43.022387 7fcf2a7d4700 1 ====== req done > req=0x7fcf580102a0 http_status=200 ====== > > First, what is the req=XXXX line? Is that a thread-id? > I am asking, because the same id is used over and over > in the same file over time. It's the request id (within the current radosgw instance) > > More importantly, what do the RGWObjManifest::operator++():... > lines mean? In the middle case above the block even ends The read position > with one of the ERROR lines mentioned before, but the HTTP > status is still 200, suggesting a succesful operation. Well, at that point, where the client already disconnected, we can't really do much. The 200 is set before we sent any data to the client, but after sending the headers, this cannot be changed. Yehuda > > Thanks in advance for shedding some light, because I would like > to know if I need to take some action or at least keep an > eye on these via monitoring? > > Cheers, > Daniel > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com