On Mon, 09 Aug 2010 13:09:56 +0530 ravichandra <vmynidi@xxxxxxxxxxxxxxxxxx> wrote: > Hi, > Thanks.The patch you have sent is working.There is no hanging up > after the patch is applied.can you elaborate on the problem which was > there earlier?? > It's .... complicated. An important fact is that generic_make_request queues recursive requests rather than issuing them immediately. This avoids excessive stack usage with stacked block devices. So in the case where a read crosses a chunk boundary, raid10:make_request issues two separate generic_make_request calls to two different devices, each preceded by a wait_barrier call (Which is cancelled with allow_barrer() when the request completes). The first is queued and will not be issued until the second is also queued and the raid10:make_request call completes. The wait_barrier call increments nr_pending. If the resync/recovery thread tries to 'raise_barrier' between these calls, it will find nr_pending set and will wait with ->barrier incremented so when the next wait_barrier is attempted, is will block - forever. If generic_make_request didn't queue things, the first request would complete, nr_pending would decrement, resync would proceed with a single request, then the second wait_barrier would complete and the second request could be submitted. The fix was to elevate conf->nr_waiting for the duration of both submissions so raise_barrier holds off setting ->barrier until both submissions are complete. Hope that makes sense. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html