Christoph Hellwig
2014-09-25 16:57:43 UTC
The issue we've run into started when this patch started making its
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/scsi_error.c?id=14216561e164671ce147458653b1fea06a4ada1e
That changed the behaviour for user initiated TUR commands. After an ipr
adapter gets reset, all disk array devices require a start unit command
to be issued to them before they will accept commands. So, with the SCSI
EH change, we now end up in a scenario with dual ipr adapters where the
TUR getting issued from the health checker returns with a Not Ready response
and since SCSI EH no longer triggers the Start Unit in this scenario,
the path never recovers.
The alternative solution would be to change the TUR path checker in multipath-tools
to issue a Start Unit if it sees a 02/04/02.
Or we could fix up the check introduced by the commit, with somethinghttp://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/scsi_error.c?id=14216561e164671ce147458653b1fea06a4ada1e
That changed the behaviour for user initiated TUR commands. After an ipr
adapter gets reset, all disk array devices require a start unit command
to be issued to them before they will accept commands. So, with the SCSI
EH change, we now end up in a scenario with dual ipr adapters where the
TUR getting issued from the health checker returns with a Not Ready response
and since SCSI EH no longer triggers the Start Unit in this scenario,
the path never recovers.
The alternative solution would be to change the TUR path checker in multipath-tools
to issue a Start Unit if it sees a 02/04/02.
ala:
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index a2c3d3d..7228d9e 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -459,13 +459,18 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
if (! scsi_command_normalize_sense(scmd, &sshdr))
return FAILED; /* no valid sense data */
- if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
+ if (scmd->cmnd[0] == TEST_UNIT_READY &&
+ scmd->request->cmd_type == REQ_TYPE_FS &&
+ scmd->scsi_done != scsi_eh_done) {
/*
* nasty: for mid-layer issued TURs, we need to return the
* actual sense data without any recovery attempt. For eh
- * issued ones, we need to try to recover and interpret
+ * issued ones, we need to try to recover and interpret,
+ * and for pass through TURs we just need to stay out of the
+ * way, so that the device handlers can do the right thing.
*/
return SUCCESS;
+ }
scsi_report_sense(sdev, &sshdr);
Thanks,
Brian
--
Brian King
Power Linux I/O
IBM Linux Technology Center
--
dm-devel mailing list
https://www.redhat.com/mailman/listinfo/dm-devel
---end quoted text---Brian
--
Brian King
Power Linux I/O
IBM Linux Technology Center
--
dm-devel mailing list
https://www.redhat.com/mailman/listinfo/dm-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html