Re: rbd: I/O Errors in low memory situations

Mike Christie <mchristi@xxxxxxxxxx> · Thu, 19 Feb 2015 17:21:02 -0600

On 02/18/2015 06:05 PM, "Sebastian Köhler [Alfahosting GmbH]" wrote:
> Hi,
> 
> yesterday we had had the problem that one of our cluster clients
> remounted a rbd device in read-only mode. We found this[1] stack trace
> in the logs. We investigated further and found similar traces on all
> other machines that are using the rbd kernel module. It seems to me that
> whenever there is a swapping situation on a client those I/O errors occur.
> Is there anything we can do or is this something that needs to be fixed
> in the code?

Hi,

I was looking at that code the other day and was thinking rbd.c might
need some changes.

1. We cannot use GFP_KERNEL in the main IO path (requests that are sent
down rbd_request_fn and related helper IO), because the allocation could
come back on rbd_request_fn.
2. We should use GFP_NOIO instead of GFP_ATOMIC if we have the proper
context and are not holding a spin lock.
3. We should be using a mempool or preallocate enough mem, so we can
make forward progress on at least one IO at a time.

I started to make the attached patch (attached version is built over
linus's tree today). I think it can be further refined, so we pass in
the gfp_t to some functions, because I think in some cases we could use
GFP_KERNEL and/or we do not need to use the mempool. For example, I do
not think we could use GFP_KERNEL and not use the mempool in the
rbd_obj_watch_request_helper code paths.

I was not done with evaluating all the paths, so had not yet posted it.
Patch is not tested.

Hey Ilya, I was not sure about the layered related code. I thought
functions like rbd_img_obj_parent_read_full could get called as a result
of a IO getting sent down the rbd_request_fn, but was not 100% sure. I
meant to test it out, but have been busy with other stuff.
[PATCH] ceph/rbd: use GFP_NOIO and mempool

1. We cannot use GFP_KERNEL in the main IO path, because it could come back on us.
2. We should use GFP_NOIO instead of GFP_ATOMIC if we have the proper context and are not holding a spin lock.
3. We should be using a mempool or preallocate enough mem, so we can make forward progress on at least one IO at a time.

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 8a86b62..c01ecaf 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1915,8 +1915,8 @@ static struct ceph_osd_request *rbd_osd_req_create(
 	/* Allocate and initialize the request, for the num_ops ops */
 
 	osdc = &rbd_dev->rbd_client->client->osdc;
-	osd_req = ceph_osdc_alloc_request(osdc, snapc, num_ops, false,
-					  GFP_ATOMIC);
+	osd_req = ceph_osdc_alloc_request(osdc, snapc, num_ops, true,
+					  GFP_NOIO);
 	if (!osd_req)
 		return NULL;	/* ENOMEM */
 
@@ -1998,11 +1998,11 @@ static struct rbd_obj_request *rbd_obj_request_create(const char *object_name,
 	rbd_assert(obj_request_type_valid(type));
 
 	size = strlen(object_name) + 1;
-	name = kmalloc(size, GFP_KERNEL);
+	name = kmalloc(size, GFP_NOIO);
 	if (!name)
 		return NULL;
 
-	obj_request = kmem_cache_zalloc(rbd_obj_request_cache, GFP_KERNEL);
+	obj_request = kmem_cache_zalloc(rbd_obj_request_cache, GFP_NOIO);
 	if (!obj_request) {
 		kfree(name);
 		return NULL;
@@ -2456,7 +2456,7 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
 					bio_chain_clone_range(&bio_list,
 								&bio_offset,
 								clone_size,
-								GFP_ATOMIC);
+								GFP_NOIO);
 			if (!obj_request->bio_list)
 				goto out_unwind;
 		} else if (type == OBJ_REQUEST_PAGES) {
@@ -2687,7 +2687,7 @@ static int rbd_img_obj_parent_read_full(struct rbd_obj_request *obj_request)
 	 * from the parent.
 	 */
 	page_count = (u32)calc_pages_for(0, length);
-	pages = ceph_alloc_page_vector(page_count, GFP_KERNEL);
+	pages = ceph_alloc_page_vector(page_count, GFP_NOIO);
 	if (IS_ERR(pages)) {
 		result = PTR_ERR(pages);
 		pages = NULL;
@@ -2814,7 +2814,7 @@ static int rbd_img_obj_exists_submit(struct rbd_obj_request *obj_request)
 	 */
 	size = sizeof (__le64) + sizeof (__le32) + sizeof (__le32);
 	page_count = (u32)calc_pages_for(0, size);
-	pages = ceph_alloc_page_vector(page_count, GFP_KERNEL);
+	pages = ceph_alloc_page_vector(page_count, GFP_NOIO);
 	if (IS_ERR(pages))
 		return PTR_ERR(pages);
 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com