Discussion:
[PATCH] megaraid_sas: Enable shared tag map
Hannes Reinecke
2014-09-29 11:47:52 UTC
Permalink
Megaraid_sas uses a shared pool of commands per HBA, so we
should be enabling a shared tag map.
This will allow the I/O scheduler to make better scheduling
decisions and will avoid BUSY states in the driver.

Signed-off-by: Hannes Reinecke <***@suse.de>
---
drivers/scsi/megaraid/megaraid_sas_base.c | 34 ++++++++++++++++++++++++++++++-
1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index f6a69a3..996fa9a 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -1659,7 +1659,7 @@ static int megasas_slave_configure(struct scsi_device *sdev)
*/
blk_queue_rq_timeout(sdev->request_queue,
MEGASAS_DEFAULT_CMD_TIMEOUT * HZ);
-
+ sdev->tagged_supported = 1;
return 0;
}

@@ -1667,6 +1667,10 @@ static int megasas_slave_alloc(struct scsi_device *sdev)
{
u16 pd_index = 0;
struct megasas_instance *instance ;
+
+ /* We have a shared tag map, so TCQ is always supported */
+ sdev->tagged_supported = 1;
+
instance = megasas_lookup_instance(sdev->host->host_no);
if (sdev->channel < MEGASAS_MAX_PD_CHANNELS) {
/*
@@ -1677,13 +1681,20 @@ static int megasas_slave_alloc(struct scsi_device *sdev)
sdev->id;
if (instance->pd_list[pd_index].driveState ==
MR_PD_STATE_SYSTEM) {
+ scsi_activate_tcq(sdev, MEGASAS_DEFAULT_CMD_PER_LUN);
return 0;
}
return -ENXIO;
}
+ scsi_activate_tcq(sdev, MEGASAS_DEFAULT_CMD_PER_LUN);
return 0;
}

+static void megasas_slave_destroy(struct scsi_device *sdev)
+{
+ scsi_deactivate_tcq(sdev, 1);
+}
+
void megaraid_sas_kill_hba(struct megasas_instance *instance)
{
if ((instance->pdev->device == PCI_DEVICE_ID_LSI_SAS0073SKINNY) ||
@@ -2746,6 +2757,21 @@ struct device_attribute *megaraid_host_attrs[] = {
NULL,
};

+static int megasas_change_queue_type(struct scsi_device *sdev, int tag_type)
+{
+ if (sdev->host->bqt && sdev->tagged_supported) {
+ scsi_set_tag_type(sdev, tag_type);
+
+ if (tag_type)
+ scsi_activate_tcq(sdev, sdev->queue_depth);
+ else
+ scsi_deactivate_tcq(sdev, sdev->queue_depth);
+ } else
+ tag_type = 0;
+
+ return tag_type;
+}
+
/*
* Scsi host template for megaraid_sas driver
*/
@@ -2756,6 +2782,7 @@ static struct scsi_host_template megasas_template = {
.proc_name = "megaraid_sas",
.slave_configure = megasas_slave_configure,
.slave_alloc = megasas_slave_alloc,
+ .slave_destroy = megasas_slave_destroy,
.queuecommand = megasas_queue_command,
.eh_device_reset_handler = megasas_reset_device,
.eh_bus_reset_handler = megasas_reset_bus_host,
@@ -2765,6 +2792,7 @@ static struct scsi_host_template megasas_template = {
.bios_param = megasas_bios_param,
.use_clustering = ENABLE_CLUSTERING,
.change_queue_depth = megasas_change_queue_depth,
+ .change_queue_type = megasas_change_queue_type,
.no_write_same = 1,
};

@@ -4909,6 +4937,10 @@ static int megasas_io_attach(struct megasas_instance *instance)
host->this_id = instance->init_id;
host->sg_tablesize = instance->max_num_sge;

+ if (!scsi_init_shared_tag_map(host, host->can_queue))
+ printk(KERN_WARNING "megasas: enable TCQ support, depth %d",
+ host->can_queue);
+
if (instance->fw_support_ieee)
instance->max_sectors_per_req = MEGASAS_MAX_SECTORS_IEEE;
--
1.8.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Webb Scales
2014-09-29 16:52:14 UTC
Permalink
[Hannes, James, Christoph: sorry for the extra copies -- my mail client
didn't automatically convert to plain-text for the list, so it bounced.]

Could someone enlighten me on when and where a driver might want to or
be required to set the "tagged_supported" flag?

A quick grep yields a number of places where the flag value is set or
cleared.

This looks like the mechanism which is supposed to set the flag (e.g.,
if the device/LUN characteristics suggest that it's appropriate), and so
I'm not sure when/why the driver would want to do it explicitly:

./drivers/scsi/scsi_scan.c: sdev->tagged_supported = 1; [in scsi_add_lun()]


These look like attempts to override the SCSI mid-layer:

./drivers/scsi/fnic/fnic_main.c: sdev->tagged_supported = 1; [in '_slave_alloc()]
./drivers/scsi/stex.c: sdev->tagged_supported = 1; [in '_slave_alloc()]
./drivers/scsi/stex.c: sdev->tagged_supported = 1; [in '_slave_configure()]
./drivers/scsi/qla4xxx/ql4_os.c: sdev->tagged_supported = 1; [in '_slave_alloc()]
./drivers/scsi/qla4xxx/ql4_os.c: sdev->tagged_supported = 1; [in '_slave_configure()]
./drivers/scsi/ufs/ufshcd.c: sdev->tagged_supported = 1; [in '_slave_alloc()]
./drivers/scsi/scsi_debug.c: sdp->tagged_supported = 1; [in '_slave_configure()]


However, it strikes me that it's unnecessary to set the flag in /both/
'_slave_alloc() and '_slave_configure() -- I think that either one
should work, and '_slave_configure() seems like the appropriate choice.
(But, I'm left wondering why these drivers need to override the SCSI
ML's choice....)

These are interesting, but not pertinent to the discussion at hand (they
seem to be fall-backs triggered during error recovery):

./drivers/scsi/atari_NCR5380.c: cmd->device->tagged_supported = 0; [error recovery?]
./drivers/scsi/53c700.c: SCp->device->tagged_supported = 0; [error recovery?]
./drivers/scsi/sun3_NCR5380.c: cmd->device->tagged_supported = 0; [error recovery?]


And, this one looks spurious (that is, I believe it's clearing the flag
when it's already clear):

./drivers/scsi/libsas/sas_scsi_host.c: scsi_dev->tagged_supported = 0; [looks spurious]


So, taking the patch below, for example, why is tagged_supported being
set in /both/ the '_slave_alloc() and the '_slave_configure() routines,
and, in fact, why is it being set at all -- why doesn't the SCSI
mid-layer set it correctly in this case?


Thanks,

Webb
Post by Hannes Reinecke
Megaraid_sas uses a shared pool of commands per HBA, so we
should be enabling a shared tag map.
This will allow the I/O scheduler to make better scheduling
decisions and will avoid BUSY states in the driver.
---
drivers/scsi/megaraid/megaraid_sas_base.c | 34 ++++++++++++++++++++++++++++++-
1 file changed, 33 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index f6a69a3..996fa9a 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -1659,7 +1659,7 @@ static int megasas_slave_configure(struct scsi_device *sdev)
*/
blk_queue_rq_timeout(sdev->request_queue,
MEGASAS_DEFAULT_CMD_TIMEOUT * HZ);
-
+ sdev->tagged_supported = 1;
return 0;
}
@@ -1667,6 +1667,10 @@ static int megasas_slave_alloc(struct scsi_device *sdev)
{
u16 pd_index = 0;
struct megasas_instance *instance ;
+
+ /* We have a shared tag map, so TCQ is always supported */
+ sdev->tagged_supported = 1;
+
instance = megasas_lookup_instance(sdev->host->host_no);
if (sdev->channel < MEGASAS_MAX_PD_CHANNELS) {
/*
@@ -1677,13 +1681,20 @@ static int megasas_slave_alloc(struct scsi_device *sdev)
sdev->id;
if (instance->pd_list[pd_index].driveState ==
MR_PD_STATE_SYSTEM) {
+ scsi_activate_tcq(sdev, MEGASAS_DEFAULT_CMD_PER_LUN);
return 0;
}
return -ENXIO;
}
+ scsi_activate_tcq(sdev, MEGASAS_DEFAULT_CMD_PER_LUN);
return 0;
}
+static void megasas_slave_destroy(struct scsi_device *sdev)
+{
+ scsi_deactivate_tcq(sdev, 1);
+}
+
void megaraid_sas_kill_hba(struct megasas_instance *instance)
{
if ((instance->pdev->device == PCI_DEVICE_ID_LSI_SAS0073SKINNY) ||
@@ -2746,6 +2757,21 @@ struct device_attribute *megaraid_host_attrs[] = {
NULL,
};
+static int megasas_change_queue_type(struct scsi_device *sdev, int tag_type)
+{
+ if (sdev->host->bqt && sdev->tagged_supported) {
+ scsi_set_tag_type(sdev, tag_type);
+
+ if (tag_type)
+ scsi_activate_tcq(sdev, sdev->queue_depth);
+ else
+ scsi_deactivate_tcq(sdev, sdev->queue_depth);
+ } else
+ tag_type = 0;
+
+ return tag_type;
+}
+
/*
* Scsi host template for megaraid_sas driver
*/
@@ -2756,6 +2782,7 @@ static struct scsi_host_template megasas_template = {
.proc_name = "megaraid_sas",
.slave_configure = megasas_slave_configure,
.slave_alloc = megasas_slave_alloc,
+ .slave_destroy = megasas_slave_destroy,
.queuecommand = megasas_queue_command,
.eh_device_reset_handler = megasas_reset_device,
.eh_bus_reset_handler = megasas_reset_bus_host,
@@ -2765,6 +2792,7 @@ static struct scsi_host_template megasas_template = {
.bios_param = megasas_bios_param,
.use_clustering = ENABLE_CLUSTERING,
.change_queue_depth = megasas_change_queue_depth,
+ .change_queue_type = megasas_change_queue_type,
.no_write_same = 1,
};
@@ -4909,6 +4937,10 @@ static int megasas_io_attach(struct megasas_instance *instance)
host->this_id = instance->init_id;
host->sg_tablesize = instance->max_num_sge;
+ if (!scsi_init_shared_tag_map(host, host->can_queue))
+ printk(KERN_WARNING "megasas: enable TCQ support, depth %d",
+ host->can_queue);
+
if (instance->fw_support_ieee)
instance->max_sectors_per_req = MEGASAS_MAX_SECTORS_IEEE;
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
James Bottomley
2014-09-29 17:08:59 UTC
Permalink
Post by Webb Scales
[Hannes, James, Christoph: sorry for the extra copies -- my mail client
didn't automatically convert to plain-text for the list, so it bounced.]
Could someone enlighten me on when and where a driver might want to or
be required to set the "tagged_supported" flag?
A quick grep yields a number of places where the flag value is set or
cleared.
This looks like the mechanism which is supposed to set the flag (e.g.,
if the device/LUN characteristics suggest that it's appropriate), and so
./drivers/scsi/scsi_scan.c: sdev->tagged_supported = 1; [in scsi_add_lun()]
That's set based on device capabilities.
Post by Webb Scales
./drivers/scsi/fnic/fnic_main.c: sdev->tagged_supported = 1; [in '_slave_alloc()]
./drivers/scsi/stex.c: sdev->tagged_supported = 1; [in '_slave_alloc()]
./drivers/scsi/stex.c: sdev->tagged_supported = 1; [in '_slave_configure()]
./drivers/scsi/qla4xxx/ql4_os.c: sdev->tagged_supported = 1; [in '_slave_alloc()]
./drivers/scsi/qla4xxx/ql4_os.c: sdev->tagged_supported = 1; [in '_slave_configure()]
./drivers/scsi/ufs/ufshcd.c: sdev->tagged_supported = 1; [in '_slave_alloc()]
./drivers/scsi/scsi_debug.c: sdp->tagged_supported = 1; [in '_slave_configure()]
Slave configure is allowed to override this, so they all have reasons I
would suspect (qla is because tape is untagged, but qla still needs a
tag for the internal queue, for instance).
Post by Webb Scales
However, it strikes me that it's unnecessary to set the flag in /both/
'_slave_alloc() and '_slave_configure() -- I think that either one
should work, and '_slave_configure() seems like the appropriate choice.
(But, I'm left wondering why these drivers need to override the SCSI
ML's choice....)
Stex is the only one that does this; it's probably just a make sure
thing. You might want to set in slave alloc if you need a tagged
Inquiry command.
Post by Webb Scales
These are interesting, but not pertinent to the discussion at hand (they
./drivers/scsi/atari_NCR5380.c: cmd->device->tagged_supported = 0; [error recovery?]
./drivers/scsi/53c700.c: SCp->device->tagged_supported = 0; [error recovery?]
./drivers/scsi/sun3_NCR5380.c: cmd->device->tagged_supported = 0; [error recovery?]
SPI allows the device to reject the tag enabling messages ... that's
what this lot are all doing.
Post by Webb Scales
And, this one looks spurious (that is, I believe it's clearing the flag
./drivers/scsi/libsas/sas_scsi_host.c: scsi_dev->tagged_supported = 0; [looks spurious]
Pointless but harmless.
Post by Webb Scales
So, taking the patch below, for example, why is tagged_supported being
set in /both/ the '_slave_alloc() and the '_slave_configure() routines,
and, in fact, why is it being set at all -- why doesn't the SCSI
mid-layer set it correctly in this case?
Because things like the qla have to have tagged inquiry because it's a
shared host tag map used to label the issue queue. Setting it twice is
overkill because we don't actually turn it off, but it could be
construed as prudent in case we did (we could make tagged_supported
exactly reflect the state of the CmdQue bit for instance).

James

��칻�&�~�&���+-��ݶ��w��˛���m�b��lrȢ��^n�r���z���h�����&���G���h�
Christoph Hellwig
2014-09-30 07:43:29 UTC
Permalink
Post by Hannes Reinecke
Megaraid_sas uses a shared pool of commands per HBA, so we
should be enabling a shared tag map.
This will allow the I/O scheduler to make better scheduling
decisions and will avoid BUSY states in the driver.
What exact problem did you see? Do you have a link to a bugzilla entry
or similar? Was this observed on real hardware or your qemu emulation?
Post by Hannes Reinecke
---
drivers/scsi/megaraid/megaraid_sas_base.c | 34 ++++++++++++++++++++++++++++++-
1 file changed, 33 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index f6a69a3..996fa9a 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -1659,7 +1659,7 @@ static int megasas_slave_configure(struct scsi_device *sdev)
*/
blk_queue_rq_timeout(sdev->request_queue,
MEGASAS_DEFAULT_CMD_TIMEOUT * HZ);
-
+ sdev->tagged_supported = 1;
+ /* We have a shared tag map, so TCQ is always supported */
+ sdev->tagged_supported = 1;
+
Why doesn't the device return the proper data in the INQUIRY response?

And more importantly why do you want to do this twice?
Post by Hannes Reinecke
instance = megasas_lookup_instance(sdev->host->host_no);
if (sdev->channel < MEGASAS_MAX_PD_CHANNELS) {
/*
@@ -1677,13 +1681,20 @@ static int megasas_slave_alloc(struct scsi_device *sdev)
sdev->id;
if (instance->pd_list[pd_index].driveState ==
MR_PD_STATE_SYSTEM) {
+ scsi_activate_tcq(sdev, MEGASAS_DEFAULT_CMD_PER_LUN);
return 0;
}
return -ENXIO;
}
+ scsi_activate_tcq(sdev, MEGASAS_DEFAULT_CMD_PER_LUN);
return 0;
Please refactor the code so that the first case falls through to the
second, something like:


if (instance->pd_list[pd_index].driveState !=
MR_PD_STATE_SYSTEM)
return -ENXIO;
}

scsi_activate_tcq(sdev, MEGASAS_DEFAULT_CMD_PER_LUN);
return 0;

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hannes Reinecke
2014-09-30 09:08:17 UTC
Permalink
Post by Hannes Reinecke
Megaraid_sas uses a shared pool of commands per HBA, so we
should be enabling a shared tag map.
This will allow the I/O scheduler to make better scheduling
decisions and will avoid BUSY states in the driver.
=20
What exact problem did you see? Do you have a link to a bugzilla ent=
ry
or similar? Was this observed on real hardware or your qemu emulatio=
n?
=20
Well, _actually_ I just did it so that I could get the tag number
for my printk patchset :-)

But there is an underlying reason:
megaraid_sas uses an internal frame pool of a fixed size.
Once that's exhausted it cannot queue any more commands, and need to
return busy.
But as this is a per-host command pool all LUNs have to shared the
same frame pool, and the driver uses heuristics (ie 1/4 of the pool
size) to set cmd_per_lun. So if you have more than 4 LUNs you'll run
into issues as the pool can become exhausted.
Hence I thought it would be good to expose this to the block layer
so that we can avoid BUSY states in the driver.
Post by Hannes Reinecke
---
drivers/scsi/megaraid/megaraid_sas_base.c | 34 ++++++++++++++++++++=
++++++++++-
Post by Hannes Reinecke
1 file changed, 33 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scs=
i/megaraid/megaraid_sas_base.c
Post by Hannes Reinecke
index f6a69a3..996fa9a 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -1659,7 +1659,7 @@ static int megasas_slave_configure(struct scsi=
_device *sdev)
Post by Hannes Reinecke
*/
blk_queue_rq_timeout(sdev->request_queue,
MEGASAS_DEFAULT_CMD_TIMEOUT * HZ);
-
+ sdev->tagged_supported =3D 1;
=20
Post by Hannes Reinecke
+ /* We have a shared tag map, so TCQ is always supported */
+ sdev->tagged_supported =3D 1;
+
=20
Why doesn't the device return the proper data in the INQUIRY response=
?
=20
And more importantly why do you want to do this twice?
=20
Bah, Typo.
Post by Hannes Reinecke
instance =3D megasas_lookup_instance(sdev->host->host_no);
if (sdev->channel < MEGASAS_MAX_PD_CHANNELS) {
/*
@@ -1677,13 +1681,20 @@ static int megasas_slave_alloc(struct scsi_d=
evice *sdev)
Post by Hannes Reinecke
sdev->id;
if (instance->pd_list[pd_index].driveState =3D=3D
MR_PD_STATE_SYSTEM) {
+ scsi_activate_tcq(sdev, MEGASAS_DEFAULT_CMD_PER_LUN);
return 0;
}
return -ENXIO;
}
+ scsi_activate_tcq(sdev, MEGASAS_DEFAULT_CMD_PER_LUN);
return 0;
=20
Please refactor the code so that the first case falls through to the
=20
=20
if (instance->pd_list[pd_index].driveState !=3D
MR_PD_STATE_SYSTEM)
return -ENXIO;
}
=20
scsi_activate_tcq(sdev, MEGASAS_DEFAULT_CMD_PER_LUN);
return 0;
=20
Okay, will be doing it.

Cheers,

Hannes
--=20
Dr. Hannes Reinecke zSeries & Storage
***@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg
GF: J. Hawn, J. Guild, F. Imend=F6rffer, HRB 16746 (AG N=FCrnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" i=
n
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Webb Scales
2014-10-01 18:51:31 UTC
Permalink
Hannes,

In megasas_change_queue_type(), is it possible for sdev->queue_depth to
be greater than 256?

Unless I'm misunderstanding the SCSI code, we can request a
queue-depth/tag-map-size which is greater than 256, but, since the
scsi_cmnd::tag field is an unsigned char, depths greater than 256 may
overflow the field when high-numbered tags are used...do I have that right?


Thanks,

Webb
Post by Hannes Reinecke
Megaraid_sas uses a shared pool of commands per HBA, so we
should be enabling a shared tag map.
This will allow the I/O scheduler to make better scheduling
decisions and will avoid BUSY states in the driver.
---
drivers/scsi/megaraid/megaraid_sas_base.c | 34 ++++++++++++++++++++++++++++++-
1 file changed, 33 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index f6a69a3..996fa9a 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -1659,7 +1659,7 @@ static int megasas_slave_configure(struct scsi_device *sdev)
*/
blk_queue_rq_timeout(sdev->request_queue,
MEGASAS_DEFAULT_CMD_TIMEOUT * HZ);
-
+ sdev->tagged_supported = 1;
return 0;
}
@@ -1667,6 +1667,10 @@ static int megasas_slave_alloc(struct scsi_device *sdev)
{
u16 pd_index = 0;
struct megasas_instance *instance ;
+
+ /* We have a shared tag map, so TCQ is always supported */
+ sdev->tagged_supported = 1;
+
instance = megasas_lookup_instance(sdev->host->host_no);
if (sdev->channel < MEGASAS_MAX_PD_CHANNELS) {
/*
@@ -1677,13 +1681,20 @@ static int megasas_slave_alloc(struct scsi_device *sdev)
sdev->id;
if (instance->pd_list[pd_index].driveState ==
MR_PD_STATE_SYSTEM) {
+ scsi_activate_tcq(sdev, MEGASAS_DEFAULT_CMD_PER_LUN);
return 0;
}
return -ENXIO;
}
+ scsi_activate_tcq(sdev, MEGASAS_DEFAULT_CMD_PER_LUN);
return 0;
}
+static void megasas_slave_destroy(struct scsi_device *sdev)
+{
+ scsi_deactivate_tcq(sdev, 1);
+}
+
void megaraid_sas_kill_hba(struct megasas_instance *instance)
{
if ((instance->pdev->device == PCI_DEVICE_ID_LSI_SAS0073SKINNY) ||
@@ -2746,6 +2757,21 @@ struct device_attribute *megaraid_host_attrs[] = {
NULL,
};
+static int megasas_change_queue_type(struct scsi_device *sdev, int tag_type)
+{
+ if (sdev->host->bqt && sdev->tagged_supported) {
+ scsi_set_tag_type(sdev, tag_type);
+
+ if (tag_type)
+ scsi_activate_tcq(sdev, sdev->queue_depth);
+ else
+ scsi_deactivate_tcq(sdev, sdev->queue_depth);
+ } else
+ tag_type = 0;
+
+ return tag_type;
+}
+
/*
* Scsi host template for megaraid_sas driver
*/
@@ -2756,6 +2782,7 @@ static struct scsi_host_template megasas_template = {
.proc_name = "megaraid_sas",
.slave_configure = megasas_slave_configure,
.slave_alloc = megasas_slave_alloc,
+ .slave_destroy = megasas_slave_destroy,
.queuecommand = megasas_queue_command,
.eh_device_reset_handler = megasas_reset_device,
.eh_bus_reset_handler = megasas_reset_bus_host,
@@ -2765,6 +2792,7 @@ static struct scsi_host_template megasas_template = {
.bios_param = megasas_bios_param,
.use_clustering = ENABLE_CLUSTERING,
.change_queue_depth = megasas_change_queue_depth,
+ .change_queue_type = megasas_change_queue_type,
.no_write_same = 1,
};
@@ -4909,6 +4937,10 @@ static int megasas_io_attach(struct megasas_instance *instance)
host->this_id = instance->init_id;
host->sg_tablesize = instance->max_num_sge;
+ if (!scsi_init_shared_tag_map(host, host->can_queue))
+ printk(KERN_WARNING "megasas: enable TCQ support, depth %d",
+ host->can_queue);
+
if (instance->fw_support_ieee)
instance->max_sectors_per_req = MEGASAS_MAX_SECTORS_IEEE;
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Hellwig
2014-10-01 21:10:03 UTC
Permalink
Post by Webb Scales
Hannes,
In megasas_change_queue_type(), is it possible for sdev->queue_depth to be
greater than 256?
Unless I'm misunderstanding the SCSI code, we can request a
queue-depth/tag-map-size which is greater than 256, but, since the
scsi_cmnd::tag field is an unsigned char, depths greater than 256 may
overflow the field when high-numbered tags are used...do I have that right?
Yes, we really need to increase the size of the tag field. SAM allows
a transport specific limit of up to 64 _bytes_ for it, although I don't
know implementation that large. Given that the block layer can generate
up to 32-bit tags both for the old blk-tag.c code and the new
blk-mq-tag.c version it would be good to use a u32 there. Can you send
me a patch?

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hannes Reinecke
2014-10-02 06:51:20 UTC
Permalink
Post by Webb Scales
Hannes,
In megasas_change_queue_type(), is it possible for sdev->queue_depth=
to be
Post by Webb Scales
greater than 256?
Unless I'm misunderstanding the SCSI code, we can request a
queue-depth/tag-map-size which is greater than 256, but, since the
scsi_cmnd::tag field is an unsigned char, depths greater than 256 ma=
y
Post by Webb Scales
overflow the field when high-numbered tags are used...do I have that=
right?
=20
Yes, we really need to increase the size of the tag field. SAM allow=
s
a transport specific limit of up to 64 _bytes_ for it, although I don=
't
know implementation that large. Given that the block layer can gener=
ate
up to 32-bit tags both for the old blk-tag.c code and the new
blk-mq-tag.c version it would be good to use a u32 there. Can you se=
nd
me a patch?
=20
Weeelll ...

I'm afraid it's not _that_ easy.
SCSI-II tagged queueing has some specific tag values:

#define SIMPLE_QUEUE_TAG 0x20
#define HEAD_OF_QUEUE_TAG 0x21
#define ORDERED_QUEUE_TAG 0x22

drivers/scsi/vmw_pvscsi.c:
e->tag =3D SIMPLE_QUEUE_TAG;
if (sdev->tagged_supported &&
(cmd->tag =3D=3D HEAD_OF_QUEUE_TAG ||
cmd->tag =3D=3D ORDERED_QUEUE_TAG))
e->tag =3D cmd->tag;

So we cannot translate between them easily.
Unless we move the SCSI-II values to negative and use the positive
values for 'real' tag numbers.

The recommendation here is to use 'scmd->request->tag' whenever
you want to get to the tag number, and 'scmd->tag' if you have to
play around with SCSI-II TCQ.
But if not I would strongly advise to leave 'scmd->tag' alone.

Speaking of which:
We have

drivers/scsi/scsi_lib.c:scsi_get_cmd_from_request()

/* pull a tag out of the request if we have one */
cmd->tag =3D req->tag;
cmd->request =3D req;

What would happen if the block layer decides to use tag number 0x22?
Wouldn't that completely bugger up the TCQ logic?
Why do we need to pull the request tag into the scmd tag, anyway?
I'd rather leave that alone, so that the SCSI midlayer can insert
whatever magic it'll decide to do here ...

James?

Cheers,

Hannes
--=20
Dr. Hannes Reinecke zSeries & Storage
***@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg
GF: J. Hawn, J. Guild, F. Imend=F6rffer, HRB 16746 (AG N=FCrnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" i=
n
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Hellwig
2014-10-02 09:19:16 UTC
Permalink
Post by Hannes Reinecke
I'm afraid it's not _that_ easy.
#define SIMPLE_QUEUE_TAG 0x20
#define HEAD_OF_QUEUE_TAG 0x21
#define ORDERED_QUEUE_TAG 0x22
These are not tag values. These are message codes set in the first byte
of the Queue tag message, the second byte is the actual tag.
Post by Hannes Reinecke
The recommendation here is to use 'scmd->request->tag' whenever
you want to get to the tag number, and 'scmd->tag' if you have to
play around with SCSI-II TCQ.
But if not I would strongly advise to leave 'scmd->tag' alone.
Or kill off scmd->tag.. Let's see how feasible that is.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hannes Reinecke
2014-10-02 09:50:42 UTC
Permalink
Post by Hannes Reinecke
I'm afraid it's not _that_ easy.
#define SIMPLE_QUEUE_TAG 0x20
#define HEAD_OF_QUEUE_TAG 0x21
#define ORDERED_QUEUE_TAG 0x22
=20
These are not tag values. These are message codes set in the first b=
yte
of the Queue tag message, the second byte is the actual tag.
=20
Post by Hannes Reinecke
The recommendation here is to use 'scmd->request->tag' whenever
you want to get to the tag number, and 'scmd->tag' if you have to
play around with SCSI-II TCQ.
But if not I would strongly advise to leave 'scmd->tag' alone.
=20
Or kill off scmd->tag.. Let's see how feasible that is.
=20
I'm about to.
(See my last two patches).
There are now two instances left:
NCR5380 (and derived LLDDs) and fnic.
=46or some weird reasons fnic decided to duplicate blk-tag
functionality. Looking into it.

And NCR5380 tag support can be safely ignored, it never worked anyway.

Cheers,

Hannes
--=20
Dr. Hannes Reinecke zSeries & Storage
***@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg
GF: J. Hawn, J. Guild, F. Imend=F6rffer, HRB 16746 (AG N=FCrnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" i=
n
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Hellwig
2014-10-02 11:36:52 UTC
Permalink
Post by Hannes Reinecke
NCR5380 (and derived LLDDs) and fnic.
fnic needs a tag for a device reset, which we don't provide if it
is issue by ioctl. This is also showed up during the blk-mq work.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hannes Reinecke
2014-10-02 12:00:32 UTC
Permalink
Post by Christoph Hellwig
Post by Hannes Reinecke
NCR5380 (and derived LLDDs) and fnic.
=20
fnic needs a tag for a device reset, which we don't provide if it
is issue by ioctl. This is also showed up during the blk-mq work.
=20
Actually, I've hit a similar issue for megaraid_sas (and I guess
others which implement a host-wide tag map might suffer here, too):
tags need to be used even for internal commands.
But tag allocation is done exclusively in the block layer, so one
cannot readily influence this.
(Or, if you do, you'll end up with code duplication like the fnic
driver).
Thing is, for internal commands we typically do not _need_ a fully
formed request, we just need to tag number.

So we could try to implement a function like 'blk_tag_reserve'
to return the next free tag.

But even with that we'd be hitting a tag starvation issue; when
all command tags are in use we cannot send internal commands.
And if the command abort happens to be modelled with an internal
command, we cannot even abort commands anymore.
Which is probably _not_ what we want.
(And I think fnic will suffer from this, too ...)

So I guess we need to set aside a reserved tag pool from which
the internal commands will be allocated.
(Bit like the emergency network skb pool).
With that the tag starvation issue will be resolved, and
we should be able to remove the custom code in fnic.

Cheers,

Hannes
--=20
Dr. Hannes Reinecke zSeries & Storage
***@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg
GF: J. Hawn, J. Guild, F. Imend=F6rffer, HRB 16746 (AG N=FCrnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" i=
n
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Hellwig
2014-10-02 14:32:56 UTC
Permalink
Post by Hannes Reinecke
Actually, I've hit a similar issue for megaraid_sas (and I guess
tags need to be used even for internal commands.
But tag allocation is done exclusively in the block layer, so one
cannot readily influence this.
(Or, if you do, you'll end up with code duplication like the fnic
driver).
Thing is, for internal commands we typically do not _need_ a fully
formed request, we just need to tag number.
blk-mq allows a driver to reserve tags, and allocate them using
blk_mq_alloc_request with the reserved flag. Take a look at the
mtip32xx driver, which is using that feature.

The lockless hpsa series has a patch to expose the nr of reserved
tags to scsi, but there is no equivalent in the old blk code yet.

Another issue is that we only setup the tag allocator at scsi_add_host
time, and to fully support drivers that needs tags during initialization
we need move it to scsi_host_alloc time, which doesn't sound like a
major problem.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
James Bottomley
2014-10-02 12:05:43 UTC
Permalink
Post by Hannes Reinecke
Post by Christoph Hellwig
Post by Webb Scales
Hannes,
In megasas_change_queue_type(), is it possible for sdev->queue_depth to be
greater than 256?
Unless I'm misunderstanding the SCSI code, we can request a
queue-depth/tag-map-size which is greater than 256, but, since the
scsi_cmnd::tag field is an unsigned char, depths greater than 256 may
overflow the field when high-numbered tags are used...do I have that right?
Yes, we really need to increase the size of the tag field. SAM allows
a transport specific limit of up to 64 _bytes_ for it, although I don't
know implementation that large. Given that the block layer can generate
up to 32-bit tags both for the old blk-tag.c code and the new
blk-mq-tag.c version it would be good to use a u32 there. Can you send
me a patch?
Weeelll ...
I'm afraid it's not _that_ easy.
#define SIMPLE_QUEUE_TAG 0x20
#define HEAD_OF_QUEUE_TAG 0x21
#define ORDERED_QUEUE_TAG 0x22
e->tag = SIMPLE_QUEUE_TAG;
if (sdev->tagged_supported &&
(cmd->tag == HEAD_OF_QUEUE_TAG ||
cmd->tag == ORDERED_QUEUE_TAG))
e->tag = cmd->tag;
A SCSI-2 tag is a SPI two byte message. The first byte is the message
type. The values you have above identify the message type for simple,
ordered and head of queue tags. The *second* byte is the tag value.

See page 55 and 56 of the SCSI-2 standard. There's no connection (or
shouldn't be) between the message type and the tag value, so it looks
like a bug in the pvscsi driver.

James

��칻�&�~�&���+-��ݶ��w��˛���m�b��lrȢ��^n�r���z���h�����&���G���h�
Hannes Reinecke
2014-10-02 12:14:19 UTC
Permalink
Post by Hannes Reinecke
Post by Webb Scales
Hannes,
In megasas_change_queue_type(), is it possible for sdev->queue_dep=
th to be
Post by Hannes Reinecke
Post by Webb Scales
greater than 256?
Unless I'm misunderstanding the SCSI code, we can request a
queue-depth/tag-map-size which is greater than 256, but, since the
scsi_cmnd::tag field is an unsigned char, depths greater than 256 =
may
Post by Hannes Reinecke
Post by Webb Scales
overflow the field when high-numbered tags are used...do I have th=
at right?
Post by Hannes Reinecke
Yes, we really need to increase the size of the tag field. SAM all=
ows
Post by Hannes Reinecke
a transport specific limit of up to 64 _bytes_ for it, although I d=
on't
Post by Hannes Reinecke
know implementation that large. Given that the block layer can gen=
erate
Post by Hannes Reinecke
up to 32-bit tags both for the old blk-tag.c code and the new
blk-mq-tag.c version it would be good to use a u32 there. Can you =
send
Post by Hannes Reinecke
me a patch?
Weeelll ...
I'm afraid it's not _that_ easy.
#define SIMPLE_QUEUE_TAG 0x20
#define HEAD_OF_QUEUE_TAG 0x21
#define ORDERED_QUEUE_TAG 0x22
e->tag =3D SIMPLE_QUEUE_TAG;
if (sdev->tagged_supported &&
(cmd->tag =3D=3D HEAD_OF_QUEUE_TAG ||
cmd->tag =3D=3D ORDERED_QUEUE_TAG))
e->tag =3D cmd->tag;
=20
A SCSI-2 tag is a SPI two byte message. The first byte is the messag=
e
type. The values you have above identify the message type for simple=
,
ordered and head of queue tags. The *second* byte is the tag value.
=20
See page 55 and 56 of the SCSI-2 standard. There's no connection (or
shouldn't be) between the message type and the tag value, so it looks
like a bug in the pvscsi driver.
=20
It is. I've already sent a patch.

But that just proves the scmd->tag is essentially a duplicate
and we should be removing it.

Cheers,

Hannes
--=20
Dr. Hannes Reinecke zSeries & Storage
***@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=C3=BCrnberg
GF: J. Hawn, J. Guild, F. Imend=C3=B6rffer, HRB 16746 (AG N=C3=BCrnberg=
)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" i=
n
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
James Bottomley
2014-10-02 13:06:50 UTC
Permalink
Post by Hannes Reinecke
Post by James Bottomley
Post by Hannes Reinecke
Post by Christoph Hellwig
Post by Webb Scales
Hannes,
In megasas_change_queue_type(), is it possible for sdev->queue_depth to be
greater than 256?
Unless I'm misunderstanding the SCSI code, we can request a
queue-depth/tag-map-size which is greater than 256, but, since the
scsi_cmnd::tag field is an unsigned char, depths greater than 256 may
overflow the field when high-numbered tags are used...do I have that right?
Yes, we really need to increase the size of the tag field. SAM allows
a transport specific limit of up to 64 _bytes_ for it, although I don't
know implementation that large. Given that the block layer can generate
up to 32-bit tags both for the old blk-tag.c code and the new
blk-mq-tag.c version it would be good to use a u32 there. Can you send
me a patch?
Weeelll ...
I'm afraid it's not _that_ easy.
#define SIMPLE_QUEUE_TAG 0x20
#define HEAD_OF_QUEUE_TAG 0x21
#define ORDERED_QUEUE_TAG 0x22
e->tag = SIMPLE_QUEUE_TAG;
if (sdev->tagged_supported &&
(cmd->tag == HEAD_OF_QUEUE_TAG ||
cmd->tag == ORDERED_QUEUE_TAG))
e->tag = cmd->tag;
A SCSI-2 tag is a SPI two byte message. The first byte is the message
type. The values you have above identify the message type for simple,
ordered and head of queue tags. The *second* byte is the tag value.
See page 55 and 56 of the SCSI-2 standard. There's no connection (or
shouldn't be) between the message type and the tag value, so it looks
like a bug in the pvscsi driver.
It is. I've already sent a patch.
But that just proves the scmd->tag is essentially a duplicate
and we should be removing it.
Yes, it was the original tag field, which then got replaced by the block
one and the accessor commands. scsi_populate_tag_msg (which is really
an SPI construct) meant that drivers fully adopting the model didn't
even need to look at cmd->tag any more. I'm fine with dropping it in
favour of req->tag.

James

N�����r��y����b�X��ǧv�^�)޺{.n�+����{���"�{ay�ʇڙ�,j��f���h���z��w���
Loading...