Discussion:
[PATCH] block: remove artifical max_hw_sectors cap
Christoph Hellwig
2014-09-06 23:08:05 UTC
Permalink
Set max_sectors to the value the drivers provides as hardware limit by
default. Linux had proper I/O throttling for a long time and doesn't
rely on a artifically small maximum I/O size anymore. By not limiting
the I/O size by default we remove an annoying tuning step required for
most Linux installation.

Note that both the user, and if absolutely required the driver can still
impose a limit for FS requests below max_hw_sectors_kb.

Signed-off-by: Christoph Hellwig <***@lst.de>
---
block/blk-settings.c | 4 +---
drivers/block/aoe/aoeblk.c | 2 +-
include/linux/blkdev.h | 1 -
3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index f1a1795..f52c223 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -257,9 +257,7 @@ void blk_limits_max_hw_sectors(struct queue_limits *limits, unsigned int max_hw_
__func__, max_hw_sectors);
}

- limits->max_hw_sectors = max_hw_sectors;
- limits->max_sectors = min_t(unsigned int, max_hw_sectors,
- BLK_DEF_MAX_SECTORS);
+ limits->max_sectors = limits->max_hw_sectors = max_hw_sectors;
}
EXPORT_SYMBOL(blk_limits_max_hw_sectors);

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index dd73e1f..46c282f 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -395,7 +395,7 @@ aoeblk_gdalloc(void *vp)
WARN_ON(d->flags & DEVFL_TKILL);
WARN_ON(d->gd);
WARN_ON(d->flags & DEVFL_UP);
- blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
+ blk_queue_max_hw_sectors(q, 1024);
q->backing_dev_info.name = "aoe";
q->backing_dev_info.ra_pages = READ_AHEAD / PAGE_CACHE_SIZE;
d->bufpool = mp;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 518b465..7e3c172 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1192,7 +1192,6 @@ extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
enum blk_default_limits {
BLK_MAX_SEGMENTS = 128,
BLK_SAFE_MAX_SECTORS = 255,
- BLK_DEF_MAX_SECTORS = 1024,
BLK_MAX_SEGMENT_SIZE = 65536,
BLK_SEG_BOUNDARY_MASK = 0xFFFFFFFFUL,
};
--
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Hellwig
2014-09-23 05:59:47 UTC
Permalink
ping?
Post by Christoph Hellwig
Set max_sectors to the value the drivers provides as hardware limit by
default. Linux had proper I/O throttling for a long time and doesn't
rely on a artifically small maximum I/O size anymore. By not limiting
the I/O size by default we remove an annoying tuning step required for
most Linux installation.
Note that both the user, and if absolutely required the driver can still
impose a limit for FS requests below max_hw_sectors_kb.
---
block/blk-settings.c | 4 +---
drivers/block/aoe/aoeblk.c | 2 +-
include/linux/blkdev.h | 1 -
3 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index f1a1795..f52c223 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -257,9 +257,7 @@ void blk_limits_max_hw_sectors(struct queue_limits *limits, unsigned int max_hw_
__func__, max_hw_sectors);
}
- limits->max_hw_sectors = max_hw_sectors;
- limits->max_sectors = min_t(unsigned int, max_hw_sectors,
- BLK_DEF_MAX_SECTORS);
+ limits->max_sectors = limits->max_hw_sectors = max_hw_sectors;
}
EXPORT_SYMBOL(blk_limits_max_hw_sectors);
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index dd73e1f..46c282f 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -395,7 +395,7 @@ aoeblk_gdalloc(void *vp)
WARN_ON(d->flags & DEVFL_TKILL);
WARN_ON(d->gd);
WARN_ON(d->flags & DEVFL_UP);
- blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
+ blk_queue_max_hw_sectors(q, 1024);
q->backing_dev_info.name = "aoe";
q->backing_dev_info.ra_pages = READ_AHEAD / PAGE_CACHE_SIZE;
d->bufpool = mp;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 518b465..7e3c172 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1192,7 +1192,6 @@ extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
enum blk_default_limits {
BLK_MAX_SEGMENTS = 128,
BLK_SAFE_MAX_SECTORS = 255,
- BLK_DEF_MAX_SECTORS = 1024,
BLK_MAX_SEGMENT_SIZE = 65536,
BLK_SEG_BOUNDARY_MASK = 0xFFFFFFFFUL,
};
--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
---end quoted text---
Christoph Hellwig
2014-10-01 13:08:22 UTC
Permalink
As we still haven't made any progress on this let me explain why
the limit does not make sense: It only applies to _FS request,
which basically have three use cases:

- metadata I/O. Generally small enough that the limit does not
matter at all.
- buffered reads/writes. We already have a self-tuning algorithm
that limits writeback size, and a readahead size tunable that
caps read sizes. Imposing another confusing limit that does
not interact with the visible tunables here is not helpful
- direct I/O. Users should get something resembling their request
as closely as possible on the write, and this is where our
stupid limitation causes the most problems.
Post by Christoph Hellwig
Set max_sectors to the value the drivers provides as hardware limit by
default. Linux had proper I/O throttling for a long time and doesn't
rely on a artifically small maximum I/O size anymore. By not limiting
the I/O size by default we remove an annoying tuning step required for
most Linux installation.
Note that both the user, and if absolutely required the driver can still
impose a limit for FS requests below max_hw_sectors_kb.
---
block/blk-settings.c | 4 +---
drivers/block/aoe/aoeblk.c | 2 +-
include/linux/blkdev.h | 1 -
3 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index f1a1795..f52c223 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -257,9 +257,7 @@ void blk_limits_max_hw_sectors(struct queue_limits *limits, unsigned int max_hw_
__func__, max_hw_sectors);
}
- limits->max_hw_sectors = max_hw_sectors;
- limits->max_sectors = min_t(unsigned int, max_hw_sectors,
- BLK_DEF_MAX_SECTORS);
+ limits->max_sectors = limits->max_hw_sectors = max_hw_sectors;
}
EXPORT_SYMBOL(blk_limits_max_hw_sectors);
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index dd73e1f..46c282f 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -395,7 +395,7 @@ aoeblk_gdalloc(void *vp)
WARN_ON(d->flags & DEVFL_TKILL);
WARN_ON(d->gd);
WARN_ON(d->flags & DEVFL_UP);
- blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
+ blk_queue_max_hw_sectors(q, 1024);
q->backing_dev_info.name = "aoe";
q->backing_dev_info.ra_pages = READ_AHEAD / PAGE_CACHE_SIZE;
d->bufpool = mp;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 518b465..7e3c172 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1192,7 +1192,6 @@ extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
enum blk_default_limits {
BLK_MAX_SEGMENTS = 128,
BLK_SAFE_MAX_SECTORS = 255,
- BLK_DEF_MAX_SECTORS = 1024,
BLK_MAX_SEGMENT_SIZE = 65536,
BLK_SEG_BOUNDARY_MASK = 0xFFFFFFFFUL,
};
--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
---end quoted text---
Elliott, Robert (Server Storage)
2014-10-01 18:59:34 UTC
Permalink
-----Original Message-----
Sent: Wednesday, 01 October, 2014 8:08 AM
Fengguang
Subject: Re: [PATCH] block: remove artifical max_hw_sectors cap
As we still haven't made any progress on this let me explain why
the limit does not make sense: It only applies to _FS request,
- metadata I/O. Generally small enough that the limit does not
matter at all.
- buffered reads/writes. We already have a self-tuning algorithm
that limits writeback size, and a readahead size tunable that
caps read sizes. Imposing another confusing limit that does
not interact with the visible tunables here is not helpful
- direct I/O. Users should get something resembling their request
as closely as possible on the write, and this is where our
stupid limitation causes the most problems.
One supporting example: A low limit interferes with creation of
full stripe writes for RAID controllers.
Post by Christoph Hellwig
Set max_sectors to the value the drivers provides as hardware limit by
default. Linux had proper I/O throttling for a long time and doesn't
rely on a artifically small maximum I/O size anymore. By not limiting
the I/O size by default we remove an annoying tuning step required for
most Linux installation.
Note that both the user, and if absolutely required the driver can still
impose a limit for FS requests below max_hw_sectors_kb.
---
block/blk-settings.c | 4 +---
drivers/block/aoe/aoeblk.c | 2 +-
include/linux/blkdev.h | 1 -
3 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index f1a1795..f52c223 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -257,9 +257,7 @@ void blk_limits_max_hw_sectors(struct queue_limits
*limits, unsigned int max_hw_
Post by Christoph Hellwig
__func__, max_hw_sectors);
}
- limits->max_hw_sectors = max_hw_sectors;
- limits->max_sectors = min_t(unsigned int, max_hw_sectors,
- BLK_DEF_MAX_SECTORS);
+ limits->max_sectors = limits->max_hw_sectors = max_hw_sectors;
}
EXPORT_SYMBOL(blk_limits_max_hw_sectors);
1. Documentation/block/biodoc.txt needs some updates:

blk_queue_max_sectors(q, max_sectors)
Sets two variables that limit the size of the request.

- The request queue's max_sectors, which is a soft size in
units of 512 byte sectors, and could be dynamically varied
by the core kernel.

- The request queue's max_hw_sectors, which is a hard limit
and reflects the maximum size request a driver can handle
in units of 512 byte sectors.

The default for both max_sectors and max_hw_sectors is
255. The upper limit of max_sectors is 1024.

There is no function with that name (it is now called
blk_queue_max_hw_sectors), the upper limit of max_sectors
is max_hw_sectors, and the default is misleading (255
is the default if the LLD doesn't provide max_hw_sectors).

2. Testing with hpsa and mpt3sas, this patch works as expected
for this setting. I/O sizes are still limited by max_segments,
which is expected. Something else is still limiting I/O sizes
to 1 MiB, though; probably bio_get_nr_vecs enforcing a maximum
size per bio of BIO_MAX_PAGES 256 (which is 1 MiB with 4 KiB
pages).


Otherwise,
Reviewed-by: Robert Elliott <***@hp.com>
Tested-by: Robert Elliott <***@hp.com>

---
Rob Elliott HP Server Storage
Christoph Hellwig
2014-10-01 21:12:58 UTC
Permalink
Post by Elliott, Robert (Server Storage)
One supporting example: A low limit interferes with creation of
full stripe writes for RAID controllers.
Yes, both Linux mdraid and parity raids are often crippled by the
default. The mdadm default of 512kb for each chunk combined with
the default limit very much guarantees a horrible out of the box
experience.
Christoph Hellwig
2014-10-17 13:12:47 UTC
Permalink
Post by Christoph Hellwig
Post by Elliott, Robert (Server Storage)
One supporting example: A low limit interferes with creation of
full stripe writes for RAID controllers.
Yes, both Linux mdraid and parity raids are often crippled by the
default. The mdadm default of 512kb for each chunk combined with
the default limit very much guarantees a horrible out of the box
experience.
Jens, can you apply this patch once the 3.19 queue opens?
Jens Axboe
2014-10-21 20:02:26 UTC
Permalink
Post by Christoph Hellwig
Post by Christoph Hellwig
Post by Elliott, Robert (Server Storage)
One supporting example: A low limit interferes with creation of
full stripe writes for RAID controllers.
Yes, both Linux mdraid and parity raids are often crippled by the
default. The mdadm default of 512kb for each chunk combined with
the default limit very much guarantees a horrible out of the box
experience.
Jens, can you apply this patch once the 3.19 queue opens?
Yup, lets get it applied.
--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...