Monday, March 24, 2025

parallel computing landscape

 
    • One notable challenge for the hardware tower is that it takes four to five years [4 to 5 years] to design and build chips and to port software to evaluate them. {“A view of the parallel computing landscape” by Krste Asanovic, Rastislav Bodík, James Demmel, Tony Keaveny, Kurt Keutzer, John Kubiatowicz, Nelson Morgan, David Patterson, Koushik Sen, John Wawrzynek, David Wessel, and Katherine Yelick in the Communications of the ACM, Volume 52, Issue 10, pages 56-67, October 2009.}

    • A second challenge is that two critical pieces of system software—compilers and operating systems—have grown large and unwieldy and hence resistant to change. One estimate is that it takes a decade [10-years] for a new compiler optimization to become part of production compilers.  {“A view of the parallel computing landscape” by Krste Asanovic, Rastislav Bodík, James Demmel, Tony Keaveny, Kurt Keutzer, John Kubiatowicz, Nelson Morgan, David Patterson, Koushik Sen, John Wawrzynek, David Wessel, and Katherine Yelick in the Communications of the ACM, Volume 52, Issue 10, pages 56-67, October 2009.}

    • Landscape of Parallel Computing Research:  a view from Berkeley 
       • December 18, 2006 

       • Power is expensive.  We can put more transistors on a chip than we have the power to turn on. 

       • The doubling of uniprocessor performance may now take 5 years. 

       • As chips drop below 65 nm feature sizes, they will have high soft and hard error rates. [Borkar 2005][Mukherjee et al 2005]


source:
       A View of the Parallel Computing Landscape

Krste Asanovic, Rastislav Bodík, James Demmel, Tony Keaveny, Kurt Keutzer,
John Kubiatowicz, Nelson Morgan, David Patterson, Koushik Sen,
John Wawrzynek, David Wessel, and Katherine Yelick

© 2010 

The landscape of parallel computing research : view from berkeley

Krste Asanovic
Ras Bodik
Bryan Christopher Catanzaro
Joseph James Gebis
Parry Husbands
Kurt Keutzer
David A. Patterson
William Lester Plishker
John Shalf
Samuel Webb Williams
Katherine A. Yelick


Technical report no. UCB/EECS-2006-183
http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html

December 18, 2006 
   ____________________________________

 •─• Our main point is to raise their priority : do not include features that significantly affect performance or energy if ... cannot accurately measure their impact. 
 •─• raise their priority :   do not include features that significantly affect performance, energy, or sustain ability  if ... cannot accurately measure their impact. 
 •─• 

A View of the Parallel Computing Landscape

Krste Asanovic, Rastislav Bodík, James Demmel, Tony Keaveny, Kurt Keutzer,
John Kubiatowicz, Nelson Morgan, David Patterson, Koushik Sen,
John Wawrzynek, David Wessel, and Katherine Yelick

© 2010 

The landscape of parallel computing research : view from berkeley

Krste Asanovic
Ras Bodik
Bryan Christopher Catanzaro
Joseph James Gebis
Parry Husbands
Kurt Keutzer
David A. Patterson
William Lester Plishker
John Shalf
Samuel Webb Williams
Katherine A. Yelick


Technical report no. UCB/EECS-2006-183
http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html

December 18, 2006 


The switch to parallel programming where the compiler and the programmer are explicitly responsible for performance, means that performance counters must become first-class citizens. In addition to monitoring traditional sequential processor performance features, new counters must help with the challenge of efficient parallel programming.

performance counters

performance and energy counter

Our main point is to raise their priority : do not include features that significantly affect performance or energy if programmers cannot accurately measure their impact. 

implementation efficiency
competing goals

Figure 7 shows the current lack of agreement on opacity/visibility tradeoff. 

The struggle is delivering performance while raising the level of abstraction. Going too low may achieve performance, but at the cost of exacerbating the software productivity problem, which is already a major hurdle for the information technology industry. 

transactional memory

Subjecting our assumptions about the process or programming to formal testing often yields unexpected results that challenge our intuition [Mattson 1999]

shared memory vs. message passing

..., forces coders to be aware of the exact mapping of computational tasks to processors. This style has been recognized for years to increase the cognitive load on programmers, and has persisted primarily because it is expressive and delivers the best performance. [Snir et al 1998] [Gursoy and Kale 2004]

...because the program is not over-specified, the system has quite a bit of freedom in mapping and scheduling that in theory can be used to optimize performance. 

The language proposed in the DARPA High Productivity Language Systems program are currently attempting to address this issue, with a major concern being support for user-specified distributions.

Now that we have a few decades of such experiments, we think that the conclusion is clear: some styles of parallelism has proven successful for some applications, and no style has proven best for all. 

3.  Bit-level parallelism may be exploited within a processor more efficiently in power, area, and time than between processors. For example, the Secure Hash Algorithm (SHA) for cryptography has significant parallelism, but in a form that requires very low latency communication between operations on small fields. 

memory model
Because parallel systems usually contain memory distributed throughout the machine, the question arises of the programmer's view of this memory. 

Explicitly partitioned systems (such as MIP) ..., but programmers must deal with the low-level details of performing remote updates themselves. 

we recommend relying more on autotuners that search to yield efficient parallel cade (section 6.1).

virtual machines and system libraries

microarchitecture and memory hierarchies

The compiler select which optimizers to perform, chooses parameters for these optimizations, and selects from among alternative implementations of a library kernel. 

...new optimizations often require fundamental changes to its internal data structures, ...

Consequently, users have become accustomed to turning off sophisticated optimizations, as they are known to trigger more than their fair share of compiler bugs. ([ in regulatory language, there is a tendency for the business, industry, and commerce-oriented sub-systems to turnoff environmental optimization setting of the different governmental agencies, because not turning off those environmental settings would trigger enforcement mechanism within the system; and this is where the court comes into the picture, because essentially, the court is a sort of a referee, liken to a compiler.  ])
<------------------------------------------------------------------------>

No comments:

Post a Comment

Alan Kay

  Alan Kay, 2015: Power of Simplicity https://youtu.be/NdSD07U5uBs?t=1992 https://youtu.be/NdSD07U5uBs?t=1992 Aug 10, 2015 ten years ago tod...