首页 > 存储器结构层次(二)

存储器结构层次(二)

局部性:

局部性分为时间局部性和空间局部性:Locality is typically described as having two distinct forms: temporal locality and spatial locality. In a program with good temporal locality, a memory location that is referenced once is likely to be referenced again multiple times in the near future. In a program with good spatial locality, if a memory location is referenced once, then the program is likely to reference a nearby memory location in the near future.

一个使用局部性的例子:

At the operating system level, the principle of locality allows the system to use the main memory as a cache of the most recently referenced chunks of the virtual address space. Similarly, the operating system uses main memory to cache the most recently used disk blocks in the disk file system.

CSAPP分两个方面说明了局部性的问题:

Locality of References to Program Data 和 Locality of Instruction Fetches

关于前者,CSAPP是举例说明了局部性的问题:

1 int sumvec(int v[N])
2 {
3 int i, sum = 0;
4
5 for (i = 0; i < N; i++)
6     sum += v[i];
7 return sum;
8 }

在这个例子里:

sum每次循环都会被访问一次,所以有好的时间局部性

v是一个接着一个地读取,空间局部性好,时间局部性差

Stride-1 reference patterns are a common and important source of spatial locality in programs. In general, as the stride increases, the spatial locality decreases.

关于取指令的局部性分析,举例:

 int sumarraycols(int a[M][N])
2 {
3 int i, j, sum = 0;
4
5 for (j = 0; j < N; j++)
6     for (i = 0; i < M; i++)
7         sum += a[i][j];
8 return sum;
9 }    

从取指的角度,这个函数时间局部性和空间局部性都很好,解释如下:

The instructions in the body of the for loop are executed in sequential memory order, and thus the loop enjoys good spatial locality. Since the loop body is executed multiple times, it also enjoys good temporal locality.

还顺带解释了指令数据的区别:

An important property of code that distinguishes it from program data is that it is rarely modified at run time. While a program is executing, the CPU reads its instructions from memory. The CPU rarely overwrites or modifies these instructions.

关于locality的总结:

Programs that repeatedly reference the same variables enjoy good temporal locality(不断引用同一个变量的程序具有好的时间局部性)

For programs with stride-k reference patterns, the smaller the stride the better the spatial locality. Programs with stride-1 reference patterns have good spatial locality. Programs that hop around memory with large strides have poor spatial locality(步长越短,空间局部性越好)

Loops have good temporal and spatial locality with respect to instruction fetches. The smaller the loop body and the greater the number of loop iterations, the better the locality(取指的时候,循环的时间局部性和空间局部性很好,循环体越短,循环次数越多,局部性越好)

 存储器的层次图:

值得一提的固态硬盘的位置:

As another example, solid state disks are playing an increasingly important role in the memory hierarchy, bridging the gulf between DRAM and rotating disk

关于cold misses: An empty cache is sometimes referred to as a cold cache, and misses of this kind are called compulsory misses or cold misses. Cold misses are important because they are often transient events that might not occur in steady state, after the cache has been warmed up by repeated memory accesses

一种设计缓存的方法是利用哈希,使得k+1层的数据按照地址映射到k层的某个位置

working set是程序运行过程中访问的一个大小相对固定的缓存块的一部分

capacity misses: When the size of the working set exceeds the size of the cache, the cache will experience what are known as capacity misses. In other words, the cache is just too small to handle this particular working set.

那么,不同层次的缓存是由谁管理的呢?

The compiler manages the register file, the highest level of the cache hierarchy. It decides when to issue loads when there are misses, and determines which register to store the data in. The caches at levels L1, L2, and L3 are managed entirely by hardware logic built into the caches. In a system with virtual memory, the DRAM main memory serves as a cache for data blocks stored on disk, and is managed by a combination of operating system software and address translation hardware on the CPU. For a machine with a distributed file system such as AFS, the local disk serves as a cache that is managed by the AFS client process running on the local machine. In most cases, caches operate automatically and do not require any specific or explicit actions from the program.

 

转载于:https://www.cnblogs.com/geeklove01/p/9069296.html

更多相关:

  • 上篇笔记中梳理了一把 resolver 和 balancer,这里顺着前面的流程走一遍入口的 ClientConn 对象。ClientConn// ClientConn represents a virtual connection to a conceptual endpoint, to // perform RPCs. // //...

  • 我的实验是基于PSPNet模型实现二维图像的语义分割,下面的代码直接从得到的h5文件开始往下做。。。 也不知道是自己的检索能力出现了问题还是咋回事,搜遍全网都没有可以直接拿来用的语义分割代码,东拼西凑,算是搞成功了。 实验平台:Windows、VS2015、Tensorflow1.8 api、Python3.6 具体的流程为:...

  • Path Tracing 懒得翻译了,相信搞图形学的人都能看得懂,2333 Path Tracing is a rendering algorithm similar to ray tracing in which rays are cast from a virtual camera and traced through a s...

  • configure_file( [COPYONLY] [ESCAPE_QUOTES] [@ONLY][NEWLINE_STYLE [UNIX|DOS|WIN32|LF|CRLF] ]) 我遇到的是 configure_file(config/config.in ${CMAKE_SOURCE_DIR}/...

  •     直接复制以下代码创建一个名为settings.xml的文件,放到C:UsersAdministrator.m2下即可