Problem 1: Excited Lions Pummel Ravens, Upset Patriots Write valid MIPS64 code fragments that satisfy the following requirements, for 4 points each

Question

Problem 1: Excited Lions Pummel Ravens, Upset Patriots

Write valid MIPS64 code fragments that satisfy the following requirements, for 4 points each. Or, if no such fragment can be constructed, explain why. You should assume that the MIPS64 code runs on a Big Endian machine with byte-addressable memory.

1.1 Write a valid MIPS64 code fragment that will initialize register F27 with the 8 bytes located in memory at a properly aligned address that is 8200 bytes before the value contained in register R27.

1.2 Write a valid MIPS64 code fragment with the post condition that the memory address in register R13 contains the double precision floating point quotient when the value in register F12 is divided by the value in register F22.

1.3 Write a valid MIPS64 code fragment with an operand that will be zero-extended prior to its use in computations. At most half credit if your answer is an ALU instruction with an immediate operand that is positive or zero.

1.4 Write a MIPS64 code fragment that would be used to return control to a program after a page fault has been cleared.

1.5 Using only floating point ALU instructions, write a MIPS64 code fragment that puts a floating point zero (0) into register F1. At most half credit if you use anything except floating point ALU instructions.

1.6 Write a MIPS64 code fragment that transfers a 4 byte word from memory address 0xFFFF FFFF FFFF FFF8 to register R19.

1.7 Write a valid MIPS64 code fragment with the post condition that F25 contains the double precision floating point representation of 2.5. Assume that you know that register F2 contains a bit pattern equivalent to some normalized double precision floating point number. At most half credit if any instructions other than FP ALU instructions are used.

1.8 Write a MIPS64 code fragment that will initialize F7 with the value in register F17. Include any assumptions required by your solution. At most 1 point if you assume that any register or memory location contains a specific numeric value or number.

1.9 Write a valid MIPS64 code fragment that will swap the 8-byte double precision floating point number stored at an address that is 256 bytes past the contents of register R55 with the 8-byte double precision floating point number at address 9192. Include any assumptions required by your answer. Extra points will be given if your code fragment uses only integer registers.

1.10 Write a code fragment consisting of a single valid MIPS64 instruction with the post condition that the two single precision floating point numbers packed into register F22 will be at addresses 4096 and 4050.

1.11 Write a valid MIPS64 code fragment with the post condition that register F18 contains the product of each single precision floating point number packed in register F1 and the single precision floating point number in the corresponding position in register F8.

1.12 Write a valid MIPS64 code fragment with address 8088 that transfers control to an instruction with label jaguars when the value in register F4 is less than or equal to the value in register F2. Otherwise, it transfers control to the instruction at address 8096 which jumps to the instruction with label panthers Please include any assumptions required to make your answer correct.

1.13 Write a valid MIPS64 code fragment that will switch the positions of the two single precision floating point numbers packed in register F16.

Problem 2: Normally, Eagles Emerge–Raiders Grumble

This problem examines both CPU and memory performance.

Consider the following 5-stage sequential computer that supports the MIPS64 integer instructions. The lengths of the stages are given by:

IF 32ns

ID 8ns

EX 16ns

MEM 32ns

WB 4ns

2.1 How long will it take to execute a single integer ALU instruction on this system. Explain your answer for full credit. Your answer should be in nanoseconds. Feel free to draw a tidy diagram to help the grader.

2.2 Suppose that this machine is transformed into a 5-stage pipeline that supports the same MIPS64 ISA. If the overhead required to support pipelining is 1ns, explain why the clock cycle time of the 5-stage integer pipeline is 33ns.

2.3 Using the information from the previous questions in this problem, what is the expected speedup when the sequential machine is replaced by the five stage pipeline with the 33ns clock cycle time. Justify your answer for full credit.

2.4–2.6 Your local hero, “Soup-Or-Pie P-Liner laughs when you boast about your 46ns clock cycle time for the pipelined machine. That old stage splitter says, “I can use what I know about memory access to cut the clock cycle time to 17ns (16ns plus pipelining overhead).”

What does Supe...err...PiPe Lyin’R remember about memory interactions? There are two independent steps: getting an address; and locate the correct cache block that colsves the problem So, both the IF and MEM stages can be separated into 2 independent stages of 16ns each.

Now, for the questions.

2.4 Write an expression for the speedup that Sue Perp Eyeliner would achieve if, indeed, the new clock cycle time is 17ns, including overhead. Label your work so that the grader can detect that you actually understand your answer.

2.5 What is the name of the technique used to decrease the clock cycle time? Indicate any assumptions which must be made to achieve correctness in your answerl

2.6-2.8 The following questions will use the diagram of this system shown below

F1	First half of Instruction Fetch
F2	Second half of Instruction Fetch
ID	Decode instruction and fetch register values
EX	Execute; branch condition tested, branch target computed
MEM1/ALUWB	First half of memory cycle plus WB of ALU instructions
MEM2	Memory access completes
LWB	Load write back

2.6 Identify potential structural hazards, if any, in the new machine, and indicate how to fix them.

2.7 What is the branch penalty for this machine? Give an example to illustrate your answer.

2.8 If static branch prediction were used for this architecture, which branch direction would be more appropriate–predict taken or predict not taken? Why?

2.9 A RAW hazard results when a pipeline stage using or consuming a value must be prevented from getting a stale value before the correct value is available from the pipeline stage that produces it. The consuming instruction is prevented from progressing further through the pipeline, and forwarding is used to minimize the latency.

Your job is to determine what forwarding paths are needed for the 7-stage variant above by identifying potential situations where stalls must occur to ensure that the code executes correctly. The following procedure is sufficient to do exactly that:

Identify/List all possible combinations of operand-producer/operand-consumer stages.

For each producer/consumer stage pair that you identify, determine the number of stalls, if any, required between the start of the instruction producing the result and the time at which the result is available to the consuming instruction.

For each producer/consumer pair above, give an instruction sequence illustrating the stalls, and show the pipeline diagram executing that instruction sequence.

Problem 3 : Dolphins Eject Rams

In this problem, you must answer questions regarding the performance of MIPS code executed on the MMM–MIPS architecture, a variant of the floating point pipe-like pipe (FPLP), which satisfies the following additional assumptions. Unless otherwise stated, each clearly labeled correct answer is worth 4 points.

MMM–MIPS Specifications

You must assume that the MMM-MIPS implementation includes the following features when answering the questions in this section. And, please be sure to write down any additional assumptions you make in the exam book.

Unit	# of Clock Cycles	Stages
INT ALU	1	EX
FP ADD	6	A₁,A₂,A₃,...,A₆
FP MULT	13	M₁,M₂,M₃,...,M₄,...,M₁₃
FP DIV	58	One Big Stage

FU: X-Stage Functional Unit Table

Separate instruction and data memories are used.
Register reads and writes are split across a single clock cycle.
Forwarding, bypassing, short circuiting, and load interlocks are implemented.
In the EX stage, branches are resolved, and the correct next PC is computed.
X-Stage functional units satisfy Table FU (above).
X-Stage functional units are fully pipelined wherever possible. Each functional unit pipeline stage takes one clock cycle.
Memory is Big Endian, byte-addressable, with k-byte elements aligned on k-byte boundaries.
Contention for the MEM stage is resolved by allowing the instruction that started earlier to access the stage first.
Caching is perfect, meaning that delays associated with memory access are ignored.

3.1–3.2 (8) Fill in the following table with the latencies and initiation intervals of each functional unit in the MMM-MIPS. By latency, we mean the number of independent instructions that must be started to prevent a stall from occuring between dependent instructions. Each correct value is worth one point.

Unit	Latency	Initiation Interval
INT ALU
FP ADD/SUB
FP/INT MULT
FP/INT DIV

1. How many instructions can be executing simultaneously in the MMM-MIPS? Please justify your answer for full credit.

Answer the remaining questions regarding the MMM-MIPS machine. Remember to refer to the the MMM-MIPS machine on the previous page when you answer these questions.

The next few problems refer to the code fragment below that will be run on an MMM-MIPS machine.

(1)	L.D	F7, 2400(R0)
(2)	ADD.D	F6, F9, F7
(3)	DADDI	R1, R0,# 56
(4)	DIV.D	F24, F6, F7
(5)	MUL.D	F28, F6, F7
(6)	MUL.D	F9, F6, F24
(7)	MUL.D	F24, F6, F7
(8)	S.D	4(R1),F28
(9)	L.D	F28, 64(R1)
(10)	ADD.D	F6, F28, F9

1. Compute the average CPI when executing the code fragment on a sequential version of the MMMMIPS pipeline.

Hint: Sequential execution is equivalent to permitting exactly one instruction to be in the pipeline at any given time. So, if you think that you need a table to track execution, then you are, indeed, confused.

1. Compute the average CPI when executing the same code fragment on the MMM-MIPS pipeline.

Note: You are welcomed to use the table on the next page to organize your work. However, no such table is required.

Note: The code table on the next page has been provided for your convenience should you need it to organize your work. The code fragment is there too. However, you are not required to use it to receive full credit. But, if the table is not neat, don’t expect us to spend a lot of time deciphering it to give you partial credit. And this is only 4 points; so you may want to move on.

		(1)		L.D		F7, 2400(R0)
		(2)		ADD.D		F6, F9, F7
		(3)		DADDI		R1, R0,# 56
		(4)		DIV.D		F24, F6, F7
		(5)		MUL.D		F28, F6, F7
		(6)		MUL.D		F9, F6, F24
		(7)		MUL.D		F24, F6, F7
		(8)		S.D		4(R1),F28
		(9)		L.D		F28, 64(R1)
		(10)		ADD.D		F6, F28, F9
Instruction	Fetch		Decode		Ex		Mem	WB	Comment

1. What is the Ideal CPI for the MMM-MIPS machine? Explain your answer for full credit.
2. What is meant by the term effective address when referring to instruction (9) above: L.D F28, 64(R1)? Explain–don’t just give me the effective address.
3. What assumptions, if any, are required to support the MMM-MIPS machine memory access in instruction (9) above: L.D F28, 64(R1)? Explain your answer for full credit.
4. Is there a RAW hazard in the code fragment when it is executed on the MMM-MIPS? If so, indicate the instruction(s) and register(s) involved in at most one RAW hazard. If none exists, write none and explain why not.
5. Is there a WAW hazard present, assuming that it is executed on the MMM-MIPS? If so, indicate exactly one instance, including instruction(s) and register(s) involved in the hazard. If none exists, write none and explain why not.
6. Is there a WAR hazard in the code fragment when it is executing on the MMM-MIPS? If so, indicate exactly one instance, including instruction(s) and register(s) involved in the hazard. . If none exists, write none and explain why not.
7. Is there a control hazard in the code fragment? If so, indicate the exact register and instruction involved. If not, write a MIPS code fragment which contains a control hazard, again identifying register(s) and instruction(s).
8. True or False : In the MMM-MIPS Architecture, the forwarding of values directly from a the value’s source instruction to the correct stage of the value’s destination instruction will prevent data hazards from occurring. Explain your response, as usual.
9. How many unique instruction encodings are supported in the MMM-MIPS ISA? Explain your answer for full credit.
10. What is the branch penalty for the MMM-MIPS architecture? Explain your answer.
11. True or False: The target instruction of a branch instruction can be any instruction in the program. Explain your answer, assuming that the infinite loop resulting from a self-loop is explicitly excluded from consideration.
12. Other than a Power Outage or a Page Fault, Is there an exception or a potential exception present in the code? Explain your answer for full credit.
13. Give one well explained reason why pipelining makes precise exception handling difficult.

Bibliography List your sources and number them sequentially so that you can use the number of the resource in the problem. If you used only resources provided with the course, please use this space to inform us.