Re: cephfs kernel bug (4.9.44)?

Wyllys Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx> · Thu, 31 Aug 2017 13:32:05 -0400

Thanks!

On Thu, Aug 31, 2017 at 1:28 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> On Thu, Aug 31, 2017 at 4:12 PM, Wyllys Ingersoll
> <wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:
>> Sorry for lack of detail, here is some more info:
>>
>> Currently using ceph 10.2.7
>>
>> - Prior to the error there was nothing in the kernel log for several hours.
>> - cephfs snapshots are enabled, but are not currently being taken at
>> regular intervals, the last one was 2 days before the error message
>> appeared.
>> - cephfs has a data pool with 17123471 objects (34% full) and a
>> metadata pool with 70K objects
>> - the system has 85 OSDs and 3 MDS servers, all are in a healthy state.
>> - We use 3-copy replication rules:  81739 GB used, 161 TB / 241 TB avail
>
> OK, I see what happened.
>
> You have quite a lot of snapshots -- 4758 of them?  send_request()
> attempted to encode a 8 + 4 + 4758*8 = ~38k snap context into a 4k
> buffer.  Normally it's fine because the snap context is taken into
> account when allocating a message buffer.  However, this particular
> code path (... ceph_osdc_writepages()) uses pre-allocated messages,
> which are always 4k in size.
>
> I think it's a known bug^Wlimitation.  As a short-term fix, we can
> probably increase that pre-allocated size from 4k to something bigger.
> A proper resolution would take a considerable amount of time.  Until
> then I'd recommend a much more aggressive snapshot rotation schedule,
> which is a good idea anyway -- your writes will transmit faster!
>
> Thanks,
>
>                 Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html