Basics & Instruction
• CA: Instruction, modes, data format, CPU design
• Opcode necessary in each instruction (instruction: binary combination)
• No PC in 4-address instructions
• Variable length Instruction: Fixed Length Opcode
• Fixed length Instruction: Variable Length Opcode
• Autoinc: Postinc, AutoDec: Predec
• PC relative & Base reg mode supports relocation without any change in code
• PC relative mode offset: Negative for backward jumping
• In execution phase of branch instruction value of PC updated with appropriate address
CPU & Control Unit
• Cycle time = 1/ clock rate
• 1 instruction execution time = CPI * cycle time
• Program execution time = n * CPI * cycle time
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 𝐶𝑙𝑜𝑐𝑘 𝑟𝑎𝑡𝑒
• MIPS = =
𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 ∗ 106 𝐶𝑃𝐼 ∗ 106
• 2 CPU having same instruction set can have different CPI and clock rate
• In vertical microprogrammed one signal can be enabled from one group
• In vertical microprogrammed maximum signals can be enabled at once = number of groups
𝑆𝑙𝑜𝑤𝑒𝑟 𝑇𝑒𝑐ℎ𝑛𝑖𝑞𝑢𝑒 𝑡𝑖𝑚𝑒
• Speed up = 𝐹𝑎𝑠𝑡𝑒𝑟 𝑇𝑒𝑐ℎ𝑛𝑖𝑞𝑢𝑒 𝑡𝑖𝑚𝑒
• Throughput: Number of operations per unit time
• Bandwidth: Data transferred per unit time
RISC vs CISC
Seq. No. RISC CISC
1. Simple Instruction Complex Instruction
2. Fixed Length Instruction Variable Length Instruction
3. Simple and limited Addressing Modes Complex addressing modes
4. Less Number of Instructions More Number of Instructions
5. Easy to implement using hardwired control unit Difficult to implement using hardwired control unit
6. One cycle per Instruction Multiple cycle per instruction
7. Register-to-Register arithmetic operation only Register-to-Memory & Memory-to-Register
arithmetic operations possible
8. More Number of Registers Less Number of Registers
IO Organization
𝐶ℎ𝑎𝑟 𝑏𝑖𝑡𝑠
• Efficiency of asynchronous line = 𝑇𝑜𝑡𝑎𝑙 𝑏𝑖𝑡𝑠 𝑠𝑒𝑛𝑡 𝑝𝑒𝑟 𝑐ℎ𝑎𝑟
• Time in programmed IO = time to check status + time to transfer data
• Time in Interrupt IO = interrupt overhead + time to service interrupt
• CPU sends 2 information to DMAC before transfer: starting address & Data count
• DMAC can generate address and can send control signals for memory
• CPU wait for more time in burst mode as compared to cycle stealing mode
• No CPU waiting or blocking in interleaving mode of DMA
𝑡𝑟𝑎𝑛𝑠𝑓𝑒𝑟 𝑡𝑖𝑚𝑒 𝑡𝑜 𝑚𝑒𝑚𝑜𝑟𝑦
• % of time CPU blocked (burst mode)= 𝑝𝑟𝑒𝑝𝑎𝑡𝑎𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒+𝑡𝑟𝑎𝑛𝑠𝑓𝑒𝑟 𝑡𝑜 𝑚𝑒𝑚𝑜𝑟𝑦 𝑡𝑖𝑚𝑒 ∗ 100%
𝑡𝑟𝑎𝑛𝑠𝑓𝑒𝑟 𝑡𝑖𝑚𝑒
• % of time CPU blocked (cycle stealing) = 𝑝𝑟𝑒𝑝𝑎𝑡𝑎𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 ∗ 100%
• DMA is faster mode for transferring data between IO & memory
• Max data transferred using DMA without CPUs intervention = 2𝑥 − 1, x = bits in data count
• At a time only one of DMAC and CPU can use the system buses
• During instruction execution DMA transfer can be done but not the interrupt service
, Memory Mapped IO IO Mapped IO
1. Memory wastage 1. No Memory wastage
2. All Memory access instructions used for IO access also 2. IO access and memory access instructions are different
3. No separate address space for IO 3. IO have their own separate address space
4. More Instructions for IO access 4. Less Instructions for IO access
5. More addressing modes for IO access 5. addressing modes for IO Access
6. More IO devices connected 6. Less IO devices connected
Memory Organization:
• Byte addressable memory = 128KB = 128𝐾 × 1 B = 128𝐾 × 8 bits
1
• Memory access rate =
memory access (cycle) time
• Memory access decoder = 𝑎 × 𝑏, a = address size, b = number of cells
• Multiplication table for 2 n-bit unsigned number = 22𝑛 × 2𝑛 𝑏𝑖𝑡𝑠
• Addition table for 2 n-bit unsigned number = 22𝑛 × (𝑛 + 1) 𝑏𝑖𝑡𝑠
• In multiple chip memory 1 decode output selects one entire horizontal arrangement
• If required addresses more => Vertical Arrangement
• If required data more => Horizontal Arrangement
• If required addresses & data more => Hybrid Arrangement
• Default storage unit: bits
• CPU can initiate the memory request only when memory is ready
• Associative memory is faster than SRAM (costlier too)
Static Dynamic
1. Implemented using flip-flops 1. Implemented using capacitors
2. No refresh required 2. Periodic refresh is required
3. Faster Read/Write 3. Slow Read/Write
4. Used for Cache 4. Used for main memory
5. Low Idle power consumption 5. High Idle power consumption
6. High operational power consumption 6. Low operational power consumption
Cache Organization:
• Cache is implemented based on locality of reference
• Every time there is a read miss, a block is brought from mm to cm
• Every time there is a write miss, a block does not come from mm to cm (no write allocate)
• Simultaneous access 𝑇𝑎𝑣𝑔 = 𝐻 ∗ 𝑡𝑐𝑚 + (1 – 𝐻) ∗ 𝑡𝑚𝑚
• Hierarchical access 𝑇𝑎𝑣𝑔 = 𝑡𝑐𝑚 + (1 – 𝐻) ∗ 𝑡𝑚𝑚
• 𝑇𝑏𝑙𝑜𝑐𝑘= Block si𝑧𝑒 ∗ 𝑇𝑚𝑚
• 𝑡𝑎𝑣𝑔= 𝑡𝑐𝑚 if H =100%
• Write Through:
𝑇𝑎𝑣𝑔 𝑤𝑟𝑖𝑡𝑒 = 𝑇𝑚𝑚
𝑇𝑎𝑣𝑔 = % of read ∗ 𝑇𝑟𝑒𝑎𝑑 + % 𝑜𝑓 𝑤𝑟𝑖𝑡𝑒 ∗ 𝑇𝑎𝑣𝑔 𝑤𝑟𝑖𝑡𝑒
𝐸𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 ℎ𝑖𝑡 𝑟𝑎𝑡𝑒 = 𝑟𝑒𝑎𝑑 ℎ𝑖𝑡 𝑟𝑎𝑡𝑖𝑜 ∗ % 𝑜𝑓 𝑟𝑒𝑎𝑑
• Write Back:
Simultaneous 𝑇𝑎𝑣𝑔 = 𝐻 ∗ 𝑡𝑐𝑚 + (1 – 𝐻) ∗ (𝑡𝑏𝑙𝑜𝑐𝑘 + 𝑥 ∗ 𝑡𝑏𝑙𝑜𝑐𝑘)
Hierarchical 𝑇𝑎𝑣𝑔 = 𝐻 ∗ 𝑡𝑐𝑚 + (1 – 𝐻) ∗ (𝑡𝑐𝑚 + 𝑡𝑏𝑙𝑜𝑐𝑘 + 𝑥 ∗ 𝑡𝑏𝑙𝑜𝑐𝑘)
x = % of dirty blocks
• Only 1 data sent to mm for write in write through cache
• In write though cache the block is replaced from cache directly
• In write back cache, the dirty blocks are only written back to mm
• CPU always generated mm address (even to access cache too)
• Tag identifies among all mm blocks which maps to one index, which one is present in cache
• Cm block number = (𝑚𝑚 𝑏𝑙𝑜𝑐𝑘 𝑛𝑢𝑚𝑏𝑒𝑟 )% 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑙𝑜𝑐𝑘𝑠 𝑖𝑛 𝑐𝑎𝑐ℎ𝑒
• MM address in direct mapping
Tag Cm block number Byte offset
• CA: Instruction, modes, data format, CPU design
• Opcode necessary in each instruction (instruction: binary combination)
• No PC in 4-address instructions
• Variable length Instruction: Fixed Length Opcode
• Fixed length Instruction: Variable Length Opcode
• Autoinc: Postinc, AutoDec: Predec
• PC relative & Base reg mode supports relocation without any change in code
• PC relative mode offset: Negative for backward jumping
• In execution phase of branch instruction value of PC updated with appropriate address
CPU & Control Unit
• Cycle time = 1/ clock rate
• 1 instruction execution time = CPI * cycle time
• Program execution time = n * CPI * cycle time
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 𝐶𝑙𝑜𝑐𝑘 𝑟𝑎𝑡𝑒
• MIPS = =
𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 ∗ 106 𝐶𝑃𝐼 ∗ 106
• 2 CPU having same instruction set can have different CPI and clock rate
• In vertical microprogrammed one signal can be enabled from one group
• In vertical microprogrammed maximum signals can be enabled at once = number of groups
𝑆𝑙𝑜𝑤𝑒𝑟 𝑇𝑒𝑐ℎ𝑛𝑖𝑞𝑢𝑒 𝑡𝑖𝑚𝑒
• Speed up = 𝐹𝑎𝑠𝑡𝑒𝑟 𝑇𝑒𝑐ℎ𝑛𝑖𝑞𝑢𝑒 𝑡𝑖𝑚𝑒
• Throughput: Number of operations per unit time
• Bandwidth: Data transferred per unit time
RISC vs CISC
Seq. No. RISC CISC
1. Simple Instruction Complex Instruction
2. Fixed Length Instruction Variable Length Instruction
3. Simple and limited Addressing Modes Complex addressing modes
4. Less Number of Instructions More Number of Instructions
5. Easy to implement using hardwired control unit Difficult to implement using hardwired control unit
6. One cycle per Instruction Multiple cycle per instruction
7. Register-to-Register arithmetic operation only Register-to-Memory & Memory-to-Register
arithmetic operations possible
8. More Number of Registers Less Number of Registers
IO Organization
𝐶ℎ𝑎𝑟 𝑏𝑖𝑡𝑠
• Efficiency of asynchronous line = 𝑇𝑜𝑡𝑎𝑙 𝑏𝑖𝑡𝑠 𝑠𝑒𝑛𝑡 𝑝𝑒𝑟 𝑐ℎ𝑎𝑟
• Time in programmed IO = time to check status + time to transfer data
• Time in Interrupt IO = interrupt overhead + time to service interrupt
• CPU sends 2 information to DMAC before transfer: starting address & Data count
• DMAC can generate address and can send control signals for memory
• CPU wait for more time in burst mode as compared to cycle stealing mode
• No CPU waiting or blocking in interleaving mode of DMA
𝑡𝑟𝑎𝑛𝑠𝑓𝑒𝑟 𝑡𝑖𝑚𝑒 𝑡𝑜 𝑚𝑒𝑚𝑜𝑟𝑦
• % of time CPU blocked (burst mode)= 𝑝𝑟𝑒𝑝𝑎𝑡𝑎𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒+𝑡𝑟𝑎𝑛𝑠𝑓𝑒𝑟 𝑡𝑜 𝑚𝑒𝑚𝑜𝑟𝑦 𝑡𝑖𝑚𝑒 ∗ 100%
𝑡𝑟𝑎𝑛𝑠𝑓𝑒𝑟 𝑡𝑖𝑚𝑒
• % of time CPU blocked (cycle stealing) = 𝑝𝑟𝑒𝑝𝑎𝑡𝑎𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 ∗ 100%
• DMA is faster mode for transferring data between IO & memory
• Max data transferred using DMA without CPUs intervention = 2𝑥 − 1, x = bits in data count
• At a time only one of DMAC and CPU can use the system buses
• During instruction execution DMA transfer can be done but not the interrupt service
, Memory Mapped IO IO Mapped IO
1. Memory wastage 1. No Memory wastage
2. All Memory access instructions used for IO access also 2. IO access and memory access instructions are different
3. No separate address space for IO 3. IO have their own separate address space
4. More Instructions for IO access 4. Less Instructions for IO access
5. More addressing modes for IO access 5. addressing modes for IO Access
6. More IO devices connected 6. Less IO devices connected
Memory Organization:
• Byte addressable memory = 128KB = 128𝐾 × 1 B = 128𝐾 × 8 bits
1
• Memory access rate =
memory access (cycle) time
• Memory access decoder = 𝑎 × 𝑏, a = address size, b = number of cells
• Multiplication table for 2 n-bit unsigned number = 22𝑛 × 2𝑛 𝑏𝑖𝑡𝑠
• Addition table for 2 n-bit unsigned number = 22𝑛 × (𝑛 + 1) 𝑏𝑖𝑡𝑠
• In multiple chip memory 1 decode output selects one entire horizontal arrangement
• If required addresses more => Vertical Arrangement
• If required data more => Horizontal Arrangement
• If required addresses & data more => Hybrid Arrangement
• Default storage unit: bits
• CPU can initiate the memory request only when memory is ready
• Associative memory is faster than SRAM (costlier too)
Static Dynamic
1. Implemented using flip-flops 1. Implemented using capacitors
2. No refresh required 2. Periodic refresh is required
3. Faster Read/Write 3. Slow Read/Write
4. Used for Cache 4. Used for main memory
5. Low Idle power consumption 5. High Idle power consumption
6. High operational power consumption 6. Low operational power consumption
Cache Organization:
• Cache is implemented based on locality of reference
• Every time there is a read miss, a block is brought from mm to cm
• Every time there is a write miss, a block does not come from mm to cm (no write allocate)
• Simultaneous access 𝑇𝑎𝑣𝑔 = 𝐻 ∗ 𝑡𝑐𝑚 + (1 – 𝐻) ∗ 𝑡𝑚𝑚
• Hierarchical access 𝑇𝑎𝑣𝑔 = 𝑡𝑐𝑚 + (1 – 𝐻) ∗ 𝑡𝑚𝑚
• 𝑇𝑏𝑙𝑜𝑐𝑘= Block si𝑧𝑒 ∗ 𝑇𝑚𝑚
• 𝑡𝑎𝑣𝑔= 𝑡𝑐𝑚 if H =100%
• Write Through:
𝑇𝑎𝑣𝑔 𝑤𝑟𝑖𝑡𝑒 = 𝑇𝑚𝑚
𝑇𝑎𝑣𝑔 = % of read ∗ 𝑇𝑟𝑒𝑎𝑑 + % 𝑜𝑓 𝑤𝑟𝑖𝑡𝑒 ∗ 𝑇𝑎𝑣𝑔 𝑤𝑟𝑖𝑡𝑒
𝐸𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 ℎ𝑖𝑡 𝑟𝑎𝑡𝑒 = 𝑟𝑒𝑎𝑑 ℎ𝑖𝑡 𝑟𝑎𝑡𝑖𝑜 ∗ % 𝑜𝑓 𝑟𝑒𝑎𝑑
• Write Back:
Simultaneous 𝑇𝑎𝑣𝑔 = 𝐻 ∗ 𝑡𝑐𝑚 + (1 – 𝐻) ∗ (𝑡𝑏𝑙𝑜𝑐𝑘 + 𝑥 ∗ 𝑡𝑏𝑙𝑜𝑐𝑘)
Hierarchical 𝑇𝑎𝑣𝑔 = 𝐻 ∗ 𝑡𝑐𝑚 + (1 – 𝐻) ∗ (𝑡𝑐𝑚 + 𝑡𝑏𝑙𝑜𝑐𝑘 + 𝑥 ∗ 𝑡𝑏𝑙𝑜𝑐𝑘)
x = % of dirty blocks
• Only 1 data sent to mm for write in write through cache
• In write though cache the block is replaced from cache directly
• In write back cache, the dirty blocks are only written back to mm
• CPU always generated mm address (even to access cache too)
• Tag identifies among all mm blocks which maps to one index, which one is present in cache
• Cm block number = (𝑚𝑚 𝑏𝑙𝑜𝑐𝑘 𝑛𝑢𝑚𝑏𝑒𝑟 )% 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑙𝑜𝑐𝑘𝑠 𝑖𝑛 𝑐𝑎𝑐ℎ𝑒
• MM address in direct mapping
Tag Cm block number Byte offset