Re: cephfs kernel bug (4.9.44)?

Ilya Dryomov <idryomov@xxxxxxxxx> · Thu, 31 Aug 2017 19:28:44 +0200

On Thu, Aug 31, 2017 at 4:12 PM, Wyllys Ingersoll
<wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:
> Sorry for lack of detail, here is some more info:
>
> Currently using ceph 10.2.7
>
> - Prior to the error there was nothing in the kernel log for several hours.
> - cephfs snapshots are enabled, but are not currently being taken at
> regular intervals, the last one was 2 days before the error message
> appeared.
> - cephfs has a data pool with 17123471 objects (34% full) and a
> metadata pool with 70K objects
> - the system has 85 OSDs and 3 MDS servers, all are in a healthy state.
> - We use 3-copy replication rules:  81739 GB used, 161 TB / 241 TB avail

OK, I see what happened.

You have quite a lot of snapshots -- 4758 of them?  send_request()
attempted to encode a 8 + 4 + 4758*8 = ~38k snap context into a 4k
buffer.  Normally it's fine because the snap context is taken into
account when allocating a message buffer.  However, this particular
code path (... ceph_osdc_writepages()) uses pre-allocated messages,
which are always 4k in size.

I think it's a known bug^Wlimitation.  As a short-term fix, we can
probably increase that pre-allocated size from 4k to something bigger.
A proper resolution would take a considerable amount of time.  Until
then I'd recommend a much more aggressive snapshot rotation schedule,
which is a good idea anyway -- your writes will transmit faster!

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html