Download Processor Microarchitecture: An Implementation Perspective PDF

TitleProcessor Microarchitecture: An Implementation Perspective
PublisherMorgan & Claypool Publishers
ISBN 139781608454525
Author
LanguageEnglish
File Size2.2 MB
Total Pages116
Table of Contents
                            Processor Microarchitecture An Implementation Perspective
Synthesis Lectures on Computer  Architecture
ABSTRACT
Keywords
Contents
chapter 1: Introduction
	1.1 CLASSIFICATION OF MICROARCHITECTURES
		1.1.1 Pipelined/Nonpipelined Processors
		1.1.2 In-Order/Out-of-Order Processors
		1.1.3 Scalar/Superscalar Processors
		1.1.4 Vector Processors
		1.1.5 Multicore Processors
		1.1.6 Multithreaded Processors
	1.2 CLASSIFICATION OF MARKET SEGMENTS
	1.3 OVERVIEW OF A PROCESSOR
		1.3.1 Overview of the Pipeline
chapter 2: Caches
	2.1 ADDRESS TRANSLATION
	2.2 CACHE STRUCTURE ORGANIZATION
		2.2.1 Parallel Tag and Data Array Access
		2.2.2 Serial Tag and Data Array Access
		2.2.3 Associativity Considerations
	2.3 LOCKUP-FREE CACHES
		2.3.1 Implicitly Addressed MSHRs
		2.3.2 Explicitly Addressed MSHRs
		2.3.3 In-Cache MSHRs
	2.4 MULTIPORTED CACHES
		2.4.1 True Multiported Cache Design
		2.4.2 Array Replication
		2.4.3 Virtual Multiporting
		2.4.4 Multibanking
	2.5 INSTRUCTION CACHES
		2.5.1 Multiported vs. Single Ported
		2.5.2 Lockup Free vs. Blocking
		2.5.3 Other Considerations
chapter 3: The Instruction Fetch Unit
	3.1 INSTRUCTION CACHE
		3.1.1 Trace Cache
	3.2 BRANCH TARGET BUFFER
	3.3 RETURN ADDRESS STACK
	3.4 CONDITIONAL BRANCH PREDICTION
		3.4.1 Static Prediction
		3.4.2 Dynamic Prediction
chapter 4: Decode
	4.1 RISC DECODING
	4.2 THE x86 ISA
	4.3 DYNAMIC TRANSLATION
	4.4 HIGH-PERFORMANCE x86 DECODING
		4.4.1 The Instruction Length Decoder
		4.4.2 The Dynamic Translation Unit
chapter 5 Allocation
	5.1 RENAMING THROUGH THE REORDER BUFFER
	5.2 RENAMING THROUGH A RENAME BUFFER
	5.3 MERGED REGISTER FILE
	5.4 REGISTER FILE READ
	5.5 RECOVERY IN CASE OF MISSPECULATION
	5.6 COMPARISON OF THE THREE SCHEMES
chapter 6 The Issue Stage
	6.1 INTRODUCTION
	6.2 IN-ORDER ISSUE LOGIC
	6.3 OUT-OF-ORDER ISSUE LOGIC
		6.3.1 Issue Process when Source Operands Are Read before Issue
			6.3.1.1 Issue Queue Allocation.
			6.3.1.2 Instruction Wakeup.
			6.3.1.3 Instruction Selection.
			6.3.1.4 Entry Reclamation.
		6.3.2 Issue Process when Source Operands Are Read after Issue
			6.3.2.1 Read Port Reduction.
		6.3.3  Other Implementations for Out-of-Order Issue
			6.3.3.1  Distributed Issue Queue.
			6.3.3.2  Reservation Stations.
	6.4  ISSUE LOGIC FOR MEMORY OPERATIONS
		6.4.1  Nonspeculative Memory Disambiguation
			6.4.1.1 Case Study 1: Load Ordering and Store Ordering on an AMD K6 Processor.
			6.4.1.2 Case Study 2: Partial Ordering on a MIPS R10000 Processor.
		6.4.2 Speculative Memory Disambiguation
			6.4.2.1 Case Study: Alpha 21264.
	6.5 SPECULATIVE WAKEUP OF LOAD CONSUMERS
chapter 7 Execute
	7.1 FUNCTIONAL UNITS
		7.1.1 The Integer Arithmetic and Logical Unit
		7.1.2 Integer Multiplication and Division
		7.1.3 The Address Generation Unit
		7.1.4 The Branch Unit
		7.1.5 The Floating-Point Unit
		7.1.6 The SIMD Unit
	7.2 RESULT BYPASSING
		7.2.1 Bypass in a Small Out-of-Order Machine
		7.2.2 Multilevel Bypass for Wide Out-of-Order Machines
		7.2.3 Bypass for In-Order Machines
		7.2.4 Organization of Functional Units
	7.3 CLUSTERING
		7.3.1 Clustering the Bypass Network
		7.3.2 Clustering with Replicated Register Files
		7.3.3 Clustering with Distributed Issue Queue and Register Files
chapter 8 The Commit Stage
	8.1 INTRODUCTION
	8.2 ARCHITECTURAL STATE MANAGEMENT
		8.2.1 Architectural State Based on a Retire Register File
		8.2.2 Architectural State Based on a Merged Register File
	8.3 RECOVERY OF THE SPECULATIVE STATE
		8.3.1 Recovery from a Branch Misprediction
			8.3.1.1 Handling Branch Mispredictions on an ROB-Based Architecture with RRF.
			8.3.1.2 Handling Branch Mispredictions on a Merged Register File.
		8.3.2 Recovery from an Exception
References
Author Biographies
                        
Document Text Contents
Page 1

Processor Microarchitecture
An Implementation Perspective

Page 2

ii

Chapter Title here
Kratos

Synthesis Lectures on Computer
Architecture

Editor
Mark D. Hill, University of Wisconsin
Synthesis Lectures on Computer Architecture publishes 50- to 100-page publications on topics
pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware
components to create computers that meet functional, performance and cost goals. The scope will
largely follow the purview of premier computer architecture conferences, such as ISCA, HPCA,
MICRO, and ASPLOS.

Processor Microarchitecture: An Implementation Perspective
Antonio GonzÆlez, Fernando Latorre, and Grigorios Magklis
2011

Transactional Memory, 2nd edition
Tim Harris, James Larus, and Ravi Rajwar
2010

Computer Architecture Performance Evaluation Models
Lieven Eeckhout
2010

Introduction to Reconfigurable Supercomputing
Marco Lanzagorta, Stephen Bique, and Robert Rosenberg
2009

On-Chip Networks
Natalie Enright Jerger and Li-Shiuan Peh
2009

Page 58

48 PROCESSOR MICROARCHITECTURE: AN IMPLEMENTATION PERSPECTIvE

The resource table keeps track of the availability of execution resources like functional units.
There are some functional units like divisors that are not able to accept one new operation request
every cycle. In this case, the processor could not schedule an instruction that uses the divisor if it
scheduled another instruction that used it one cycle before. Therefore, the issue logic uses this table
in order to check whether a given execution resource is available on the current cycle.

Very long instruction word (VLIW) processors implement a simpli�ed in-order issue logic.
These processors do not implement any kind of scoreboarding since it is the responsibility of the
software that generates the code to schedule every instruction far enough from the producer to have
its inputs available when it is issued for execution. This software is usually a static compiler or a
codesigned virtual machine like in Transmeta Ef�ceon [47].

6.3 OUT-OF-ORDER ISSUE LOGIC
The issue logic is a key component that determines the amount of instruction-level parallelism out-
of-order processors are able to exploit. It allows out-of-order execution by issuing instructions to the
functional units as soon as their source operands become available. However, the hardware compo-
nents involved in the issue process sit in the critical path of the processor pipeline [1]. Therefore, it
is very important to implement a good complexity-effective issue logic able to exploit instruction-
level parallelism without compromising the cycle time.

There are many different alternatives to address the multiple design decisions involving the
implementation of an issue logic. However, the goal of this chapter is not to give a wide description
of all possible implementations but to show the most common examples with the aim of giving an
idea of the characteristics of the hardware.

In this chapter, we cover two main scenarios assuming a uni�ed issue queue. Processors that
use a uni�ed issue queue implement a single queue where all renamed instructions are stored, wait-
ing to be executed. This is different from other schemes like reservation stations or distributed issue
queues where instructions are allocated in separate buffers depending on the type of resources they
need for its execution.

The �rst scenario represents an implementation of the issue logic for processors where in-
structions read their source operands before entering the issue queue like P6-like architectures.
Then, as second scenario, we describe the main changes required to implement the issue logic
where source operands are read after they are issued for execution like in MIPS R10000 or Alpha
21264. These two scenarios are suitable for any of the different existing schemes to hold the values
produced by the instructions (merged register �le, rename buffers, reorder buffer, etc).

Nevertheless, since this is an orthogonal design decision, for the sake of clarity, we will assume
a merged register �le for both implementations. Note that we call a merged register �le to a register

Page 59

THE ISSUE STAGE 49

file that stores the architectural state and the speculative values as described in detail in Chapter 5.
However, the described hardware easily can be adapted to any other register file scheme.

This chapter also covers other alternatives like distributed issue queues and reservation sta-
tions. These alternatives will be explained in less detail since most of the tradeoffs that need to be
considered in the implementation already have been covered with the aforementioned scenarios.

Finally, we pay special attention to the implementation of the issue logic for memory opera-
tions. Conversely to the rest of operations where data dependences are checked at the renaming
stage, memory dependences cannot be identified until the memory operations compute their ad-
dresses. This characteristic has significant implications on the management of these instructions, as
we will decribe later.

6.3.1 Issue Process when Source Operands Are Read before Issue
The main characteristic of an issue queue where operands are read before the issue stage is that it
needs to hold the information from the instruction to perform the issue and the values from the
source operands that have been already produced. Figure 6.1 shows a general overview of the typi-
cal components used to store this information. Every block in Figure 6.1 represents a table with as
many entries as the number of instructions that can be held by the issue queue. Moreover, for the
sake of simplicity, we assume a processor with an ISA similar to a simplified MIPS [32], where in-
structions can have up to two source operands or one source operand and an immediate value coded
as part of the instruction.

Src1 data
Src2 data
Or Imm

Ctrl infoV1 V2R1 R2
SrcId

1
SrcId

2
CAM
Dests

Destination Id of produced value

CAM
Dests

Produced value

Select Logic

To Functional Units

FIGURE 6.1: Hardware components of a typical issue queue where source operands are read before
issue.

Page 115

105

Antonio González received his Ph.D. degree from the Universitat PolitŁcnica de Catalunya (UPC),
Barcelona, Spain, in 1989. He is the founding director of the Intel Barcelona Research Center,
started in 2002, whose research focuses on computer architecture. He has been a faculty member of
the Computer Architecture Department of UPC since 1986 and became a full professor in 2002.

Antonio has �led over 40 patents, has published over 300 research papers, and has given over
80 invited talks in the areas of computer architecture and compilers. He has served as an Associ-
ate Editor of several IEEE and ACM journals, has been a member of the program committee of
numerous symposia, the program chair for some of them, including ISCA, MICRO, HPCA, ICS,
and ISPASS, and the general chair for MICRO.

Antonio�s awards include the award to the best student in computer engineering in Spain
graduating in 1986, the 2001 Rosina Ribalta Award as the advisor of the best PhD project in Infor-
mation Technology and Communications, the 2008 Duran Farrell Award for research in technol-
ogy, and the 2009 Aritmel National Award of Informatics to the Computer Engineer of the Year.

Fernando Latorre received his M.S. degree from the University of Zaragoza, Spain, in 2001 and his
Ph.D. degree from the Universitat PolitŁcnica de Catalunya (UPC), Barcelona, Spain, in 2009. His
thesis focused on ef�ciently exploiting instruction level parallelism and thread level parallelism us-
ing adaptive multithreaded/multicore architectures. Fernando joined the Intel Barcelona Research
Center in 2003 where he is a Senior Research Scientist, and he is also a member of the Architectures
and Compilers research group of the UPC. His research interests range from power-ef�cient archi-
tectures to co-designed virtual machines and parallel processors.

Fernando holds 2 patents, has �led several more, and has published more than 10 research
papers in the area of computer architecture. He has served as a reviewer for numerous ACM and
IEEE conferences and symposia, and was also a program committee member for WEED 2010 and
ISCA 2011. In 2008, he received the Duran Farrell Award for research in technology.

Grigorios Magklis received his B.Sc. degree in Computer Science from the University of Crete,
Greece, in 1998. He received his M.Sc. and Ph.D. degrees from the Computer Science Department

Author Biographies

Page 116

106 PROCESSOR MICROARCHITECTURE: AN IMPLEMENTATION PERSPECTIvE

of the University of Rochester, NY, in 2000 and 2005, respectively. His thesis focused on increas-
ing the energy ef�ciency of adaptive architectures. In 2003, Grigorios joined the Intel Barcelona
Research Center as a Senior Research Scientist. He is also a member of the Architectures and Com-
pilers research group of the Universitat PolitŁcnica de Catalunya, Barcelona, Spain, where he re-
mains until today. His research interests include power-ef�cient architectures, parallel architectures,
dynamic optimization, and operating systems, among others.

Grigorios holds 6 patents and has published more than 20 research papers in the area of
computer architecture and distributed systems. He has served as a reviewer for numerous ACM
and IEEE conferences and symposia, and was also the architecture track program chair for IPDPS
2009. In 2008, he received the Duran Farrell Award for research in technology.

Similer Documents