age|1
ENGR 325
HOMEWORK #7 KEY
FALL 2015
1. (5) In this exercise we look at memory locality properties of matrix computation. The following code
is written in C, where elements within the same row are stored contiguously. Assume each word is
a 32-bit integer. (P&H 5.1, §5.1)
for (I=0; I<8; I++)
for (J=0; J<8000; J++) A[I]
[J]=B[I][0]+A[J][I];
a. How many 32-bit integers can be stored in a 16-byte cache block?
b. References to which variables exhibit temporal locality?
c. References to which variables exhibit spatial locality?
Locality is affected by both the reference order and data layout. The same computation can also be
written below in Matlab, which differs from C by storing matrix elements within the same column
contiguously in memory.
for I=1:8
for J=1:8000
A(I,J)=B(I,0)+A(J,I);
en
d end
d. How many 16-byte cache blocks are needed to store all 32-bit matrix elements being
referenced?
e. References to which variables exhibit temporal locality?
f. References to which variables exhibit spatial locality?
SOLUTION:
a. Each 32-bit integer requires 4 bytes to store, therefore FOUR 32-bit integers can be stored in
the 16-byte block.
b. Consider the access sequence:
A[0][0] = B[0][0] + A[0][0]
A[1][0] = B[0][0] + A[0][1]
A[2][0] = B[0][0] + A[0][2]
…
A[7999] = B[0][0] + A[0]
[0] [7999]
A[0][1] = B[1][0] + A[1][0]
…
⇒ Variables I, J, and B[I][0] exhibit temporal locality.
, Page|2
c. Variables A[I][J] and B[I][0] exhibit spatial locality.
d. Consider the access sequence:
A(1,1) = B(1,0 + A(1,1)
)
A(1,2) = B(1,0 + A(2,1)
)
A(1,3) = B(1,0 + A(3,1)
)
…
A(1,8000) = B(1,0 + A(8000,1
) )
A(2,1) = B(2,0 + A(1,2)
)
A(2,2) = B(2,0 + A(2,2)
)
…
A(2,8000) = B(2,0 + B(8000,2
) )
…
A(8,1) = B(8,0 + A(1,8)
)
A(8,2) = B(8,0 + A(2,8)
)
…
A(8,8000) = B(8,0 + A(8000,8
) )
Matrix A begins as an 8-row by 8000-column matrix and is transformed into an 8000-row by
8- column matrix. So for the A matrices:
Elements = 8 x 8000 = 64,000
Total elements in the beginning and ending matrix: 64000 x 2 =
128,000 Matrix elements per cache block: 4, as per 5.1.1.
Total cache blocks: 128,000 ÷ 4 = 32,000
Note, however, that the “upper left” eight by eight elements overlap, so cache is not
needed for those. Therefore we can subtract 8 x 8 ÷ 4 = 16 cache blocks.
For the B matrix:
Elements = 8 x 1 = 8
Matrix Elements per cache block: 4, as
above Total cache blocks: 8 ÷ 4 = 2
Total 16-byte cache blocks = 32000 – 16 + 2 ⇒ 31,986 cache blocks are required.
e. As in 5.1.2, variables I, J, and B(I,0) exhibit temporal locality.
f. Unlike in C, in MATLAB matrix elements in the same column are stored contiguously. Therefore
the variables A(J,I) and B(I,0) exhibit spatial locality.
2. (10) Caches are important to providing a high-performance memory hierarchy to processors. Below
is a list of 32-bit memory address references, given as word addresses: