790
4. Section Ref 1 Pfn Ref 46 Mapped Views 4
5. User Ref 0 WaitForDel 0 Flush Count 0
6. File Object 86960228 ModWriteCount 0 System Views 0
7. Flags (8008080) File WasPurged Accessed
8. File: \Program Files\Debugging Tools for Windows (x86)\debugger.chw
Next look at the file object referenced by the control area with this command:
1. lkd> dt nt!_FILE_OBJECT 0x86960228
2. +0x000 Type : 5
3. +0x002 Size : 128
4. +0x004 DeviceObject : 0x84a69a18 _DEVICE_OBJECT
5. +0x008 Vpb : 0x84a63278 _VPB
6. +0x00c FsContext : 0x9ae3e768
7. +0x010 FsContext2 : 0xad4a0c78
8. +0x014 SectionObjectPointer : 0x86724504 _SECTION_OBJECT_POINTERS
9. +0x018 PrivateCacheMap : 0x86b48460
10. +0x01c FinalStatus : 0
11. +0x020 RelatedFileObject : (null)
12. +0x024 LockOperation : 0 ''
13. ...
The private cache map is at offset 0x18:
1. lkd> dt nt!_PRIVATE_CACHE_MAP 0x86b48460
2. +0x000 NodeTypeCode : 766
3. +0x000 Flags : _PRIVATE_CACHE_MAP_FLAGS
4. +0x000 UlongFlags : 0x1402fe
5. +0x004 ReadAheadMask : 0xffff
6. +0x008 FileObject : 0x86960228 _FILE_OBJECT
7. +0x010 FileOffset1 : _LARGE_INTEGER 0x146
8. +0x018 BeyondLastByte1 : _LARGE_INTEGER 0x14a
9. +0x020 FileOffset2 : _LARGE_INTEGER 0x14a
10. +0x028 BeyondLastByte2 : _LARGE_INTEGER 0x156
1. lkd> !fileobj 0x86960228
2. \Program Files\Debugging Tools for Windows (x86)\debugger.chw
3. Device Object: 0x84a69a18 \Driver\volmgr
4. Vpb: 0x84a63278
5. Event signalled
6. Access: Read SharedRead SharedWrite
7. Flags: 0xc0042
8. Synchronous IO
9. Cache Supported
10. Handle Created
11. Fast IO Read
12. FsContext: 0x9ae3e768 FsContext2: 0xad4a0c78
13. Private Cache Map: 0x86b48460
14. CurrentByteOffset: 156
15. Cache Data:
16. Section Object Pointers: 86724504
17. Shared Cache Map: 86b48388 File Offset: 156 in VACB number 0
18. Vacb: 84738b30
19. Your data is at: b1e00156
10.5 File System interfaces
The first time a file’s data is accessed for a read or write operation, the file system driver is
responsible for determining whether some part of the file is mapped in the system cache. If it’s not,
the file system driver must call the CcInitializeCacheMap function to set up the perfile data
structures described in the preceding section.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
792
Once a file is set up for cached access, the file system driver calls one of several functions to
access the data in the file. There are three primary methods for accessing cached data, each
intended for a specific situation:
■ The copy method copies user data between cache buffers in system space and a process
data between the cache’s buffers in system space and the application’s buffers residing in the
process address space. The functions that file system drivers can use to perform this operation are
listed in Table 10-2.
You can examine read activity from the cache via the performance counters or system
perprocessor variables stored in the processor’s control block (KPRCB) listed in Table 10-3.
10.5.2 Caching with the Mapping and Pinning Interfaces
Just as user applications read and write data in files on a disk, file system drivers need to read
and write the data that describes the files themselves (the metadata, or volume structure data).
Because the file system drivers run in kernel mode, however, they could, if the cache manager
were properly informed, modify data directly in the system cache. To permit this optimization, the
cache manager provides the functions shown in Table 10-4. These functions permit the file system
drivers to find where in virtual memory the file system metadata resides, thus allowing direct
modification without the use of intermediary buffers.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
794
If a file system driver needs to read file system metadata in the cache, it calls the cache
manager’s mapping interface to obtain the virtual address of the desired data. The cache manager
touches all the requested pages to bring them into memory and then returns control to the file
system driver. The file system driver can then access the data directly.
If the file system driver needs to modify cache pages, it calls the cache manager’s pinning
services, which keep the pages active in virtual memory so that they cannot be reclaimed.
The pages aren’t actually locked into memory (such as when a device driver locks pages for
direct memory access transfers). Most of the time, a file system driver will mark its metadata
stream “no write”, which instructs the memory manager’s mapped page writer (explained in
Chapter 9) to not write the pages to disk until explicitly told to do so. When the file system driver
unpins (releases) them, the cache manager releases its resources so that it can lazily flush any
changes to disk and release the cache view that the metadata occupied. The mapping and pinning
10.6 Fast I/O
Whenever possible, reads and writes to cached files are handled by a high-speed mechanism
named fast I/O. Fast I/O is a means of reading or writing a cached file without going through the
work of generating an IRP, as described in Chapter 7. With fast I/O, the I/O manager calls the file
system driver’s fast I/O routine to see whether I/O can be satisfied directly from the cache
manager without generating an IRP.
Because the cache manager is architected on top of the virtual memory subsystem, file
system drivers can use the cache manager to access file data simply by copying to or from pages
mapped to the actual file being referenced without going through the overhead of generating an
IRP.
Fast I/O doesn’t always occur. For example, the first read or write to a file requires setting up
the file for caching (mapping the file into the cache and setting up the cache data structures, as
explained earlier in the section “Cache Data Structures”). Also, if the caller specified an
asynchronous read or write, fast I/O isn’t used because the caller might be stalled during paging
I/O operations required to satisfy the buffer copy to or from the system cache and thus not really
providing the requested asynchronous I/O operation. But even on a synchronous I/O, the file
system driver might decide that it can’t process the I/O operation by using the fast I/O mechanism,
say, for example, if the file in question has a locked range of bytes (as a result of calls to the
Windows LockFile and UnlockFile functions). Because the cache manager doesn’t know what
parts of which files are locked, the file system driver must check the validity of the read or write,
which requires generating an IRP. The decision tree for fast I/O is shown in Figure 10-11.
These steps are involved in servicing a read or a write with fast I/O:
1. A thread performs a read or write operation.
2. If the file is cached and the I/O is synchronous, the request passes to the fast I/O entry
point of the file system driver stack. If the file isn’t cached, the file system driver sets up the file
for caching so that the next time, fast I/O can be used to satisfy a read or write request.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
797
3. If the file system driver’s fast I/O routine determines that fast I/O is possible, it calls the
One of the benefits of the cache manager, apart from the actual caching performance, is the fact
that it performs intermediate buffering to allow arbitrarily aligned and sized I/O.
10.7.1 Intelligent Read-Ahead
The cache manager uses the principle of spatial locality to perform intelligent read-ahead by
predicting what data the calling process is likely to read next based on the data that it is reading
currently. Because the system cache is based on virtual addresses, which are contiguous for a
particular file, it doesn’t matter whether they’re juxtaposed in physical memory. File read-ahead
for logical block caching is more complex and requires tight cooperation between file system
drivers and the block cache because that cache system is based on the relative positions of the
accessed data on the disk, and, of course, files aren’t necessarily stored contiguously on disk. You
can examine read-ahead activity by using the Cache: Read Aheads/sec performance counter or the
CcReadAheadIos system variable.
Reading the next block of a file that is being accessed sequentially provides an obvious
performance improvement, with the disadvantage that it will cause head seeks. To extend
readahead benefits to cases of strided data accesses (both forward and backward through a file),
the cache manager maintains a history of the last two read requests in the private cache map for
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
799
the file handle being accessed, a method known as asynchronous read-ahead with history. If a
pattern can be determined from the caller’s apparently random reads, the cache manager
extrapolates it. For example, if the caller reads page 4000 and then page 3000, the cache manager
assumes that the next page the caller will require is page 2000 and prereads it.
Note Although a caller must issue a minimum of three read operations to establish a
predictable sequence, only two are stored in the private cache map.
To make read-ahead even more efficient, the Win32 CreateFile function provides a flag
indicating forward sequential file access: FILE_FLAG_SEQUENTIAL_SCAN. If this flag is set,
the cache manager doesn’t keep a read history for the caller for prediction but instead performs
sequential read-ahead. However, as the file is read into the cache’s working set, the cache manager
unmaps views of the file that are no longer active and, if they are unmodified, directs the memory
manager to place the pages belonging to the unmapped views at the front of the standby list so that
rarely, you risk losing modified file data in the cases of a system failure (a loss especially irritating
to users who know that they asked the application to save the changes) and running out of physical
memory (because it’s being used by an excess of modified pages).
To balance these concerns, once per second the cache manager’s lazy writer function
executes on a system worker thread and queues one-eighth of the dirty pages in the system cache
to be written to disk. If the rate at which dirty pages are being produced is greater than the amount
the lazy writer had determined it should write, the lazy writer writes an additional number of dirty
pages that it calculates are necessary to match that rate. System worker threads from the
systemwide critical worker thread pool actually perform the I/O operations.
Note The cache manager provides a means for file system drivers to track when and how
much data has been written to a file. After the lazy writer flushes dirty pages to the disk, the cache
manager notifies the file system, instructing it to update its view of the valid data length for the
file. (The cache manager and file systems separately track the valid data length for a file in
memory.)
You can examine the activity of the lazy writer by examining the cache performance counters
or per-processor variables stored in the processor’s control block (KPRCB) listed in Table 10-9.
eXPeriMeNT: Watching the Cache Manager in action
In this experiment, we’ll use Process Monitor to view the underlying file system activity,
including cache manager read-ahead and write-behind, when Windows Explorer copies a large file
(in this example, a CD-ROM image) from one local directory to another. First, configure Process
Monitor’s filter to include the source and destination file paths, the Explorer.exe and System
processes, and the ReadFile and WriteFile operations. In this example, the c:\source.iso file was
copied to c:\programming\source.iso, so the filter is configured as follows:
You should see a Process Monitor trace like the one shown here after you copy the file:
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
801
The first few entries show the initial I/O processing performed by the copy engine and the
This occurs because for the first couple of megabytes of data, the cache manager hadn’t
started performing write-behind, so the memory manager’s mapped page writer began flushing the
modified destination file data (see Chapter 9 for more information on the mapped page writer).
To get a clearer view of the cache manager operations, remove Explorer from the Process
Monitor’s filter so that only the System process operations are visible, as shown next.
With this view, it’s much easier to see the cache manager’s 16-MB write-behind operations
(the maximum write sizes are 1 MB on client versions of Windows and 32 MB on server versions;
this experiment was performed on a server system). The Time Of Day column shows that these
operations occur almost exactly 1 second apart. The stack trace for one of the write-behind
operations, shown here, verifies that a cache manager worker thread is performing write-behind:
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
804
As an added experiment, try repeating this process with a remote copy instead (from one
Windows system to another) and by copying files of varying sizes. You’ll notice some different
behaviors by the copy engine and the cache manager, both on the receiving and sending sides.
Disabling Lazy Writing for a File
If you create a temporary file by specifying the flag FILE_ATTRIBUTE_TEMPORARY in a
call to the Windows CreateFile function, the lazy writer won’t write dirty pages to the disk unless
there is a severe shortage of physical memory or the file is explicitly flushed. This characteristic of
the lazy writer improves system performance—the lazy writer doesn’t immediately write data to a
disk that might ultimately be discarded. Applications usually delete temporary files soon after
closing them.
Forcing the Cache to Write Through to Disk
Because some applications can’t tolerate even momentary delays between writing a file and
seeing the updates on disk, the cache manager also supports write-through caching on a per–file
object basis; changes are written to disk as soon as they’re made. To turn on write-through
caching, set the FILE_FLAG_WRITE_THROUGH flag in the call to the CreateFile function.
Alternatively, a thread can explicitly flush an open file, by using the Windows FlushFileBuffers
10.7.3 Write Throttling
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
806
The file system and cache manager must determine whether a cached write request will affect
system performance and then schedule any delayed writes. First the file system asks the cache
manager whether a certain number of bytes can be written right now without hurting performance
by using the CcCanIWrite function and blocking that write if necessary. For asynchronous I/O, the
file system sets up a callback with the cache manager for automatically writing the bytes when
writes are again permitted by calling CcDeferWrite. Otherwise, it just blocks and waits on
CcCanIWrite to continue. Once it’s notified of an impending write operation, the cache manager
determines how many dirty pages are in the cache and how much physical memory is available. If
few physical pages are free, the cache manager momentarily blocks the file system thread that’s
requesting to write data to the cache. The cache manager’s lazy writer flushes some of the dirty
pages to disk and then allows the blocked file system thread to continue. This write throttling
prevents system performance from degrading because of a lack of memory when a file system or
network server issues a large write operation.
Note The effects of write throttling are global to the system because the resource it is based
on, available physical memory, is global to the system. This means that if heavy write activity to a
slow device triggers write throttling, writes to other devices will also be throttled.
The dirty page threshold is the number of pages that the system cache will allow to be dirty
before throttling cached writers. This value is computed at system initialization time and depends
on the product type (client or server). Two other values are also computed—the top dirty page
threshold and the bottom dirty page threshold. Depending on memory consumption and the rate at
which dirty pages are being processed, the lazy writer calls the internal function CcAdjustThrottle,
which, on server systems, performs dynamic adjustment of the current threshold based on the
calculated top and bottom values. This adjustment is made to preserve the read cache in cases of a
heavy write load that will inevitably overrun the cache and become throttled. Table 10-11 lists the
algorithms used to calculate the dirty page thresholds.
16. MmThrottleBottom: 80 ( 320 Kb)
17. MmModifiedPageListHead.Total: 10477 ( 41908 Kb)
18. Write throttles not engaged
This output shows that the number of dirty pages is far from the number that triggers write
throttling (CcDirtyPageThreshold), so the system has not engaged in any write throttling.
10.7.4 System Threads
As mentioned earlier, the cache manager performs lazy write and read-ahead I/O operations
by submitting requests to the common critical system worker thread pool. However, it does limit
the use of these threads to one less than the total number of critical system worker threads for
small and medium memory systems (two less than the total for large memory systems).
Internally, the cache manager organizes its work requests into four lists (though these are
serviced by the same set of executive worker threads):
■ The express queue is used for read-ahead operations.
■ The regular queue is used for lazy write scans (for dirty data to flush), write-behinds, and
lazy closes.
■ The fast teardown queue is used when the memory manager is waiting for the data section
owned by the cache manager to be freed so that the file can be opened with an image section
instead, which causes CcWriteBehind to flush the entire file and tear down the shared cache map.
■ The post tick queue is used for the cache manager to internally register for a notification
after each “tick” of the lazy writer thread—in other words, at the end of each pass.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
808
To keep track of the work items the worker threads need to perform, the cache manager
creates its own internal per-processor look-aside list, a fixed-length list—one for each
processor—of worker queue item structures. (Look-aside lists are discussed in Chapter 9.) The
number of worker queue items depends on system size: 32 for small-memory systems, 64 for
medium-memory systems, 128 for large-memory client systems, and 256 for large-memory server
systems. For cross-processor performance, the cache manager also allocates a global look-aside
list at the same sizes as just described.
10.8 Conclusion
a file system’s features. For example, a format that doesn’t allow user permissions to be associated
with files and directories can’t support security. A file system format can also impose limits on the
sizes of files and storage devices that the file system supports. Finally, some file system formats
efficiently implement support for either large or small files or for large or small disks. NTFS and
exFAT are examples of file system formats that offer a different set of features and usage
scenarios.
■ Clusters are the addressable blocks that many file system formats use. Cluster size is
always a multiple of the sector size, as shown in Figure 11-1. File system formats use clusters to
manage disk space more efficiently; a cluster size that is larger than the sector size divides a disk
into more manageable blocks. The potential trade-off of a larger cluster size is wasted disk space,
or internal fragmentation, that results when file sizes aren’t perfect multiples of cluster sizes.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.