Re: Hammer 0.94.10

Willem Jan Withagen <wjw@xxxxxxxxxxx> · Sat, 8 Sep 2018 12:54:13 +0200

On 07/09/2018 20:41, Gregory Farnum wrote:
I'm not seeing much in the tracker for scan_range. There's
http://tracker.ceph.com/issues/13853 which got closed as Can't
Reproduce, and that references http://tracker.ceph.com/issues/10150
which introduced the exemption for r==-ENOENT.

Honestly when I see errors on a read my first guess is usually that
the disk state is corrupted somehow. Have you tried examined the
object it's crashing on through the filesystem and using the
objectstore tool?

found out that there is not too much manual page on 
ceph-objectstore-tool other than it's commandline tool.

So can you elaborate a bit on this....
What do I check on the FS
And what do I use the objectstore tool to fix it?

I guess that the crash has something to do with this pg, although the 
data does not seem to be with the crashing OSD:
3.13 9571  2    9580    9571    0       46199192594     56196   56196 
active+recovery_wait+undersized+degraded+remapped+inconsistent 
2018-09-07 15:53:55.173876    44488'3436031    44488:3604108   [51,34,9] 
      51      [9,34]  9       42657'3372667   2018-08-18 
18:31:28.093793      42657'3372667   2018-08-18 18:31:28.093793

It is available on os 9 and 34, where it is a large (44G) sharded tree, 
and on the "broken" osd, but then only it is an unsharded directory, 
with 2 files (< 1G) in it.
:/var/lib/ceph/osd# ls -asl ceph-40/current/3.13_head
total 12304
   0 drwxr-xr-x   2 root root     171 Sep  7 14:20 .
  16 drwxr-xr-x 450 root root   12288 Sep  7 21:09 ..
   0 -rw-r--r--   1 root root       0 Sep  7 14:20 __head_00000013__3
8192 -rw-r--r--   1 root root 8388608 Sep  7 14:20 
temp\u3.13\u0\u16281745\u52573__head_00000013__fffffffffffffffb
4096 -rw-r--r--   1 root root 4194304 Sep  7 14:20 
temp\u3.13\u0\u16281745\u52574__head_00000013__fffffffffffffffb

So it looks like the data on osd.40 is the odd one out.

To prevent more osd crashing the cluster currently has:
    noout,nobackfill,noscrub,nodeep-scrub flag(s) set

So how would we get this PG consistent?

Thanx,
--WjW

-Greg

On Fri, Sep 7, 2018 at 8:15 AM, Willem Jan Withagen <wjw@xxxxxxxxxxx> wrote:
Hi,

Looking for somebody with a good memory.

I have a hammer OSD 0.94.10 that keeps crashing in the bottom assert.
Can somebody give me an indication what the problem there is, and perhaps
suggest a way to work around it. (prefably without building a new/fixed
osd.)

Thanx,
--WjW

Last parts of the crash:
    -11> 2018-09-07 16:38:38.465425 7f409b042700  1 -- 10.128.4.4:6802/3457
<== osd.48 10.128.4.9:6810/4048 174925602 ====
osd_repop(client.6334140.0:241128738 11.0
11/31e1e080/rbd_data.50b1d42e237d92.00000000000001ab/head v 44254'78496421)
v1 ==== 929+0+4874 (3087930260 0 3406714671) 0x1a632000 con 0x5597e40
    -10> 2018-09-07 16:38:38.465480 7f40ba331700  5 -- op tracker -- seq:
1556168118, time: 2018-09-07 16:38:38.465480, event: reached_pg, op:
osd_repop(client.6334140.0:241128738 11.0
11/31e1e080/rbd_data.50b1d42e237d92.00000000000001ab/head v 44254'78496421)
     -9> 2018-09-07 16:38:38.465501 7f40ba331700  5 -- op tracker -- seq:
1556168118, time: 2018-09-07 16:38:38.465501, event: started, op:
osd_repop(client.6334140.0:241128738 11.0
11/31e1e080/rbd_data.50b1d42e237d92.00000000000001ab/head v 44254'78496421)
     -8> 2018-09-07 16:38:38.465559 7f40ba331700  5 write_log with: dirty_to:
0'0, dirty_from: 4294967295'18446744073709551615, dirty_divergent_priors:
false, divergent_priors: 0, writeout_from: 44254'78496421, trimmed:
     -7> 2018-09-07 16:38:38.465610 7f40ba331700  5 -- op tracker -- seq:
1556168118, time: 2018-09-07 16:38:38.465610, event:
commit_queued_for_journal_write, op: osd_repop(client.6334140.0:241128738
11.0 11/31e1e080/rbd_data.50b1d42e237d92.00000000000001ab/head v
44254'78496421)
     -6> 2018-09-07 16:38:38.465671 7f40ce2d5700  5 -- op tracker -- seq:
1556168118, time: 2018-09-07 16:38:38.465670, event:
write_thread_in_journal_buffer, op: osd_repop(client.6334140.0:241128738
11.0 11/31e1e080/rbd_data.50b1d42e237d92.00000000000001ab/head v
44254'78496421)
     -5> 2018-09-07 16:38:38.465850 7f40cdad4700  5 -- op tracker -- seq:
1556168118, time: 2018-09-07 16:38:38.465850, event:
journaled_completion_queued, op: osd_repop(client.6334140.0:241128738 11.0
11/31e1e080/rbd_data.50b1d42e237d92.00000000000001ab/head v 44254'78496421)
     -4> 2018-09-07 16:38:38.465874 7f40cb2cf700  5 -- op tracker -- seq:
1556168118, time: 2018-09-07 16:38:38.465873, event: commit_sent, op:
osd_repop(client.6334140.0:241128738 11.0
11/31e1e080/rbd_data.50b1d42e237d92.00000000000001ab/head v 44254'78496421)
     -3> 2018-09-07 16:38:38.465887 7f40cb2cf700  1 -- 10.128.4.4:6802/3457
--> 10.128.4.9:6810/4048 -- osd_repop_reply(client.6334140.0:241128738 11.0
ondisk, result = 0) v1 -- ?+0 0x21e09840 con 0x5597e40
     -2> 2018-09-07 16:38:38.466366 7f40cbad0700  5 -- op tracker -- seq:
1556168118, time: 2018-09-07 16:38:38.466366, event: sub_op_applied, op:
osd_repop(client.6334140.0:241128738 11.0
11/31e1e080/rbd_data.50b1d42e237d92.00000000000001ab/head v 44254'78496421)
     -1> 2018-09-07 16:38:38.466396 7f40cbad0700  5 -- op tracker -- seq:
1556168118, time: 2018-09-07 16:38:38.466396, event: done, op:
osd_repop(client.6334140.0:241128738 11.0
11/31e1e080/rbd_data.50b1d42e237d92.00000000000001ab/head v 44254'78496421)
      0> 2018-09-07 16:38:38.468999 7f40b832d700 -1 osd/ReplicatedPG.cc: In
function 'void ReplicatedPG::scan_range(int, int, PG::BackfillInterval*,
ThreadPool::TPHandle&)' thread 7f40b832d700 time 2018-09-07 16:38:38.462751
osd/ReplicatedPG.cc: 10115: FAILED assert(r >= 0)

And the hammer code that goes with it:
void ReplicatedPG::scan_range(
   int min, int max, BackfillInterval *bi,
   ThreadPool::TPHandle &handle)
{
   assert(is_locked());
   dout(10) << "scan_range from " << bi->begin << dendl;
   bi->objects.clear();  // for good measure

   vector<hobject_t> ls;
   ls.reserve(max);
   int r = pgbackend->objects_list_partial(bi->begin, min, max, 0, &ls,
&bi->end);
   assert(r >= 0);
   dout(10) << " got " << ls.size() << " items, next " << bi->end << dendl;
   dout(20) << ls << dendl;

   for (vector<hobject_t>::iterator p = ls.begin(); p != ls.end(); ++p) {
     handle.reset_tp_timeout();
     ObjectContextRef obc;
     if (is_primary())
       obc = object_contexts.lookup(*p);
     if (obc) {
       bi->objects[*p] = obc->obs.oi.version;
       dout(20) << "  " << *p << " " << obc->obs.oi.version << dendl;
     } else {
       bufferlist bl;
       int r = pgbackend->objects_get_attr(*p, OI_ATTR, &bl);

       /* If the object does not exist here, it must have been removed
          * between the collection_list_partial and here.  This can happen
          * for the first item in the range, which is usually last_backfill.
          */
       if (r == -ENOENT)
         continue;

       assert(r >= 0);
       object_info_t oi(bl);
       bi->objects[*p] = oi.version;
       dout(20) << "  " << *p << " " << oi.version << dendl;
     }