Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Tentamen (uitwerkingen)

Final Exam CDA4101 Review 2022/2023

Beoordeling
-
Verkocht
-
Pagina's
8
Cijfer
A+
Geüpload op
04-08-2022
Geschreven in
2022/2023

5.4 Recall that we have two write policies and write allocation policies, and their combinations can be implemented either in L1 or L2 cache. Assume the following choices for L1 and L2 caches: 5.4.1 [5] COD §§5.3, 5.8 Buffers are employed between different levels of memory hierarchy to reduce access latency. For this given configuration, list the possible buffers needed between L1 and L2 caches, as well as L2 cache and memory. 5.4.2 [20] COD §§5.3, 5.8 Describe the procedure of handling an L1 write-miss, considering the component involved and the possibility of replacing a dirty block. 5.4.3 [20] COD §§5.3, 5.8 For a multilevel exclusive cache (a block can only reside in one of the L1 and L2 caches), configuration, describe the procedure of handling an L1 write-miss, considering the component involved and the possibility of replacing a dirty block. Consider the following program and cache behaviors. 5.4.4 [5] COD §§5.3, 5.8 For a write-through, write-allocate cache, what are the minimum read and write bandwidths (measured by byte per cycle) needed to achieve a CPI of 2? 5.4.5 [5] COD §§5.3, 5.8 For a write-back, write-allocate cache, assuming 30% of replaced data cache blocks are dirty, what are the minimal read and write bandwidths needed for a CPI of 2? 5.4.6 [5] COD §§5.3, 5.8 What are the minimal bandwidths needed to achieve the performance of CPI = 1.5? - ANSWER Memoria Caché (Arquitectura de Computadoras "A" ) - ANSWER Estructura de Computadores - 4.2 Memoria de Caché - José Luis Abellán Miguel - ANSWER 4) Which block in the cache is replaced by memory block 29?Cache configuration: 4-way set-associative cache with 8-one word blocksReplacement scheme: LRUSequence of previously accessed block addresses: 5, 13, 21, 13, 5(Note: All memory block addresses map to cache set 1) - ANSWER None. An element in set 1 is unused, so Mem[29] is placed in the fourth element of set 1. The cache has two sets (0 and 1) and 4 blocks per set. The fourth block of set 1 is unoccupied, thus Mem[29] is placed in the fourth block of set 1. The replacement scheme has not yet been used. Least recently used (LRU): - ANSWER Least recently used (LRU): A replacement scheme in which the block replaced is the one that has been unused for the longest time. 5.7 This exercise examines the impact of different cache designs, specifically comparing associative caches to the direct-mapped caches from COD Section 5.4 (Measuring and improving cache performance). For these exercises, refer to the address stream shown in Exercise 5.2. 5.7.1 [10] COD §5.4 Using the sequence of references from Exercise 5.2, show the final cache contents for a three-way set associative cache with two-word blocks and a total size of 24 words. Use LRU replacement. For each reference identify the index bits, the tag bits, the block offset bits, and if it is a hit or a miss. 5.7.2 [10] COD §5.4 Using the references from Exercise 5.2, show the final cache contents for a fully associative cache with one-word blocks and a total size of 8 words. Use LRU replacement. For each reference identify the index bits, the tag bits, and if it is a hit or a miss. 5.7.3 [15] COD §5.4 Using the references from Exercise 5.2, what is the miss rate for a fully associative cache with two-word blocks and a total size of 8 words, using LRU replacement? What is the miss rate using MRU (most recently used) replacement? Finally what is the best possible miss rate for this cache, given any replacement policy? Multilevel caching is an important technique to overcome the limited amount of space that a first level cache can provide while still maintaining its speed. Consider a processor with the following parameters: 5.7.4 [10] COD §5.4 Calculate the CPI for the processor in the table using: 1) only a first level cache, 2) a second level direct-mapped cache, and 3) a second level eight-way set associative cache. How do these numbers change if main memory access time is doubled? If it is cut in half? 5.7.5 [10] COD §5.4 It is possible to have an even greater cache hierarchy than two levels. Given the processor above with a second level, direct-mapped cache, a designer wants to add a third level cache that takes 50 cycles to access and will reduce the global miss rate to 1.3%. Would this provide better performance? In general, what are the advantages and disadvantages of adding a third level cache? 5.7.6 [20] COD §5.4 In older processors such as the Intel Pentium or Alpha 21264, the second level of cache was external (located on a different chip) from the main processor and the first level cache. While this allowed for large second level caches, the latency to access the cache was much higher, and the bandwidth was typically lower because the second level cache ran at a lower frequency. Assume a 512 KiB off-chip second level cache has a global miss rate of 4%. If each additional 512 KiB of cache lowered global miss rates by 0.7%, and the cache had a total access time of 50 cycles, how big would the cache have to be to match the performance of the second level direct-mapped cache listed above? Of the eight-way set associative cache? - ANSWER ojo solutions 6.1 First, write down a list of your daily activities that you typically do on a weekday. For instance, you might get out of bed, take a shower, get dressed, eat breakfast, dry your hair, brush your teeth. Make sure to break down your list so you have a minimum of 10 activities. 6.1.1 [5] COD §6.2 Now consider which of these activities is already exploiting some form of parallelism (e.g., brushing multiple teeth at the same time, versus one at a time, carrying one book at a time to school, versus loading them all into your backpack and then carry them "in parallel"). For each of your activities, discuss if they are already working in parallel, but if not, why they are not. 6.1.2 [5] COD §6.2 Next, consider which of the activities could be carried out concurrently (e.g., eating breakfast and listening to the news). For each of your activities, describe which other activity could be paired with this activity. 6.1.3 [5] COD §6.2 For 6.1.2, what could we change about current systems (e.g., showers, clothes, TVs, cars) so that we could perform more tasks in parallel? 6.1.4 [5] COD §6.2 Estimate how much shorter time it would take to carry out these activities if you tried to carry out as many tasks in parallel as possible. - ANSWER 6.2 You are trying to bake 3 blueberry pound cakes. Cake ingredients are as follows: 1 cup butter, softened1 cup sugar4 large eggs1 teaspoon vanilla extract1/2 teaspoon salt1/4 teaspoon nutmeg1 1/2 cups flour1 cup blueberries The recipe for a single cake is as follows:Step 1: Preheat oven to 325°F (160°C). Grease and flour your cake pan.Step 2: In large bowl, beat together with a mixer butter and sugar at medium speed until light and fluffy. Add eggs, vanilla, salt and nutmeg. Beat until thoroughly blended. Reduce mixer speed to low and add flour, 1/2 cup at a time, beating just until blended.Step 3: Gently fold in blueberries. Spread evenly in prepared baking pan. Bake for 60 minutes. 6.2.1 [5] COD §6.2 Your job is to cook 3 cakes as efficiently as possible. Assuming that you only have one oven large enough to hold one cake, one large bowl, one cake pan, and one mixer, come up with a schedule to make three cakes as quickly as possible. Identify the bottlenecks in completing this task. 6.2.2 [5] COD §6.2 Assume now that you have three bowls, 3 cake pans and 3 mixers. How much faster is the process now that you have additional resources? 6.2.3 [5] COD §6.2 Assume now that you have two friends that will help you cook, and that you have a large oven that can accommodate all three cakes. How will this change the schedule you arrived at in Exercise 6.2.1 above? 6.2.4 [5] COD §6.2 Compare the cake-making task to computing 3 iterations of a loop on a parallel computer. Identify data-level parallelism and task-level parallelism in the cake-making loop. - ANSWER 6.2.1 For this set of resources, we can pipeline the preparation. We assume that we do not have to reheat the oven for each cake. Preheat Oven Mix ingredients in bowl for Cake 1 Fill cake pan with contents of bowl and bake Cake 1. Mix ingredients for Cake 2 in bowl. Finish baking Cake 1. Empty cake pan. Fill cake pan with bowl contents for Cake 2 and bake Cake 2. Mix ingredients in bowl for Cake 3. Finish baking Cake 2. Empty cake pan. Fill cake pan with bowl contents for Cake 3 and bake Cake 3. Finish baking Cake 3. Empty cake pan. 6.2.2 Now we have 3 bowls, 3 cake pans and 3 mixers. We will name them A, B, and C. Preheat Oven Mix ingredients in bowl A for Cake 1 Fill cake pan A with contents of bowl A and bake for Cake 1. Mix ingredients for Cake 2 in bowl A. Finish baking Cake 1. Empty cake pan A. Fill cake pan A with contents of bowl A for Cake 2. Mix ingredients in bowl A for Cake 3. Finishing baking Cake 2. Empty cake pan A. Fill cake pan A with contents of bowl A for Cake 3. 6.2.3 Each step can be done in parallel for each cake. Th e time to bake 1 cake, 2 cakes or 3 cakes is exactly the same. 6.2.4 Th e loop computation is equivalent to the steps involved to make one cake. Given that we have multiple processors (or ovens and cooks), we can execute instructions (or cook multiple cakes) in parallel. Th e instructions in the loop (or cooking steps) may have some dependencies on prior instructions (or cooking steps) in the loop body (cooking a single cake). Data-level parallelism occurs when loop iterations are independent (i.e., no loop carried dependencies). Task-level parallelism includes any instructions that can be computed on parallel execution units, are similar to the independent operations involved in making multiple cakes.

Meer zien Lees minder
Instelling
Vak

Voorbeeld van de inhoud

Final Exam CDA4101 Review 2022/2023
5.1 In this exercise we look at memory locality properties of matrix computation. The
following code is written in C, where elements within the same row are stored
contiguously. Assume each word is a 32-bit integer.

for (I = 0; I < 8; I++)
for (J = 0; J < 8000; J++)
A[I][J] = B[I][0] + A[J][I];

5.1.1 [5] <COD §5.1> How many 32-bit integers can be stored in a 16-byte cache
block?
5.1.2 [5] <COD §5.1> References to which variables exhibit temporal locality?
5.1.3 [5] <COD §5.1> References to which variables exhibit spatial locality?

Locality is affected by both the reference order and data layout. The same computation
can also be written below in MATLAB, which differs from C by storing matrix elements
within the same column contiguously in memory.

for I = 1:8
for J = 1:8000
A(I,J) = B(I,0) + A(J,I);
end
end

5.1.4 [5] <COD §5.1> How many 16-byte cache blocks are needed to store all 32-bit
matrix elements being referenced?
5.1.5 [5] <COD §5.1> References to which variables exhibit temporal locality?
5.1.6 [5] <COD §5.1> References to which variables exhibit spatial locality? - ANSWER
5.1.1 [5] <COD §5.1> How many 32-bit integers can be stored in a 16-byte cache
block?

https://drive.google.com/file/d/1LhsMKJsc48EbXZqL7HsSIbF23FRAkfQR/view?
usp=sharing


5.1.2 [5] <COD §5.1> References to which variables exhibit temporal locality?


Locality is affected by both the reference order and data layout. The same computation
can also be written below in MATLAB, which differs from C by storing matrix elements
within the same column contiguously in memory.

for I = 1:8
for J = 1:8000
A(I,J) = B(I,0) + A(J,I);
end
end

, Final Exam CDA4101 Review 2022/2023
5.1.4 [5] <COD §5.1> How many 16-byte cache blocks are needed to store all 32-bit
matrix elements being referenced?

https://docs.google.com/document/d/
1jx4qQAGnxk_OoQo7sunqA3q52AUvcMSEIy4OLyBIKFE/edit?usp=sharing

5.1.5 [5] <COD §5.1> References to which variables exhibit temporal locality?

Locality Quiz - Georgia Tech - HPCA: Part 3 - ANSWER
https://www.youtube.com/watch?v=z9LHetPW0Vs&list=PLn4mZps3Wx0_6thXcBr99-
Y4n49t_lIb0&index=2&t=0s

5.2 Caches are important to providing a high-performance memory hierarchy to
processors. Below is a list of 32-bit memory address references, given as word
addresses.3, 180, 43, 2, 191, 88, 190, 14, 181, 44, 186, 253
5.2.1 [10] <COD §5.3> For each of these references, identify the binary address, the
tag, and the index given a direct-mapped cache with 16 one-word blocks. Also list if
each reference is a hit or a miss, assuming the cache is initially empty.
5.2.2 [10] <COD §5.3> For each of these references, identify the binary address, the
tag, and the index given a direct-mapped cache with two-word blocks and a total size of
8 blocks. Also list if each reference is a hit or a miss, assuming the cache is initially
empty. - ANSWER 5.2.1 [10] <COD §5.3> For each of these references, identify the
binary address, the tag, and the index given a direct-mapped cache with 16 one-word
blocks. Also list if each reference is a hit or a miss, assuming the cache is initially
empty.

https://drive.google.com/file/d/1584dgRGktv0oeJysRWcdYaJwcx37RepX/view?
usp=sharing

5.2.2 [10] <COD §5.3> For each of these references, identify the binary address, the
tag, and the index given a direct-mapped cache with two-word blocks and a total size of
8 blocks. Also list if each reference is a hit or a miss, assuming the cache is initially
empty.

https://drive.google.com/file/d/15Th2yjeNU_zyNCQva_dsk35arcHaR0lJ/view?
usp=sharing

5.2 Caches are important to providing a high-performance memory hierarchy to
processors. Below is a list of 32-bit memory address references, given as word
addresses.
3, 180, 43, 2, 191, 88, 190, 14, 181, 44, 186, 253

5.2.3 You are asked to optimize a cache design for the given references. There are
three direct-mapped cache designs possible, all with a total of 8 words of data: C1 has
1-word blocks, C2 has 2-word blocks, and C3 has 4-word blocks. In terms of miss rate,

Geschreven voor

Vak

Documentinformatie

Geüpload op
4 augustus 2022
Aantal pagina's
8
Geschreven in
2022/2023
Type
Tentamen (uitwerkingen)
Bevat
Vragen en antwoorden

Onderwerpen

$8.49
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
millyphilip West Virginia University
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
2927
Lid sinds
4 jaar
Aantal volgers
1958
Documenten
44533
Laatst verkocht
19 uur geleden
white orchid store

EXCELLENCY IN ACCADEMIC MATERIALS ie exams, study guides, testbanks ,case, case study etc

3.6

552 beoordelingen

5
240
4
87
3
104
2
32
1
89

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen