Christoph Hellwig
2009-08-20 22:12:21 UTC
Btw, something semi-related I've been looking at recently:
Currently O_DIRECT writes bypass all kernel caches, but there they do
use the disk caches. We currenly don't have any barrier support for
them at all, which is really bad for data integrity in virtualized
environments. I've started thinking about how to implement this.
The simplest scheme would be to mark the last request of each
O_DIRECT write as barrier requests. This works nicely from the FS
perspective and works with all hardware supporting barriers. It's
massive overkill though - we really only need to flush the cache
after our request, and not before. And for SCSI we would be much
better just setting the FUA bit on the commands and not require a
full cache flush at all.
The next scheme would be to simply always do a cache flush after
the direct I/O write has completed, but given that blkdev_issue_flush
blocks until the command is done that would a) require everyone to
use the end_io callback and b) spend a lot of time in that workque.
This only requires one full cache flush, but it's still suboptimal.
I have prototypes this for XFS, but I don't really like it.
The best scheme would be to get some highlevel FUA request in the
block layer which gets emulated by a post-command cache flush.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Currently O_DIRECT writes bypass all kernel caches, but there they do
use the disk caches. We currenly don't have any barrier support for
them at all, which is really bad for data integrity in virtualized
environments. I've started thinking about how to implement this.
The simplest scheme would be to mark the last request of each
O_DIRECT write as barrier requests. This works nicely from the FS
perspective and works with all hardware supporting barriers. It's
massive overkill though - we really only need to flush the cache
after our request, and not before. And for SCSI we would be much
better just setting the FUA bit on the commands and not require a
full cache flush at all.
The next scheme would be to simply always do a cache flush after
the direct I/O write has completed, but given that blkdev_issue_flush
blocks until the command is done that would a) require everyone to
use the end_io callback and b) spend a lot of time in that workque.
This only requires one full cache flush, but it's still suboptimal.
I have prototypes this for XFS, but I don't really like it.
The best scheme would be to get some highlevel FUA request in the
block layer which gets emulated by a post-command cache flush.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html