Fall 2018 - Koushik Sen and Alvin Cheung
Q1. This question is about liveness analysis.
1. When do we say that a variable is live at a statement s?
2. How liveness analysis can be used to eliminate dead code?
3. Describe the transfer functions for liveness analysis. For this question assume that for each statement s, we compute the following information about the value of x immediately before and after s.
Lin(x,s) = liveness value of x before s
Lout(x,s) = liveness value of x after s
Liveness values can be true or false.
4. Describe an algorithm for liveness analysis.
5. Show an example where the algorithm is not precise.
6. How do you fix the analysis?
Q2. This problem is about program dependence graphs (PDGs) and their applications.
(a) Define what a PDG is.
(b) What types of data dependencies are captured in PDGs?
(c) Draw the PDG for a sample program.
(d) What are the different types of data dependencies? Which of them are captured in the PDG for the sample program?
(e) PDGs can be used for program slicing. Explain what that means and compute the static slice of the program above with respect to the program point i := i+1
(f) Suppose we add exception handling to the language. How would you model that in the PDG?
(Fall 2015 - Necula and Sen):
Q1: Axiomatic semantics for a simple imperative language
* write the syntax for an imperative language with integer and boolean
expressions, and assignment, conditionals, while loops
* write the syntax for the partial and total correctness Hoare triples.
* write the syntax for precondition and postcondition predicates
* write the formal definition of the total and partial correctness, using
either denotational or operational semantics
* write the axiomatic semantics rules for the statements in the language
* Q2: automatic memory management/garbage collection
* enumerate some algorithms for automatic memory management
* discuss reference counting, what it is, how it is done, what drawbacks it has
* discuss mark-and-sweep, what it is, how it is done
* what auxiliary data structure may be needed for the mark phase, how much memory would
this take, and how can you do it while using a constant amount of additional memory
* discuss copying garbage collection, what it is, how it is done,
* which of mark-and-sweep and copying garbage collection can be used for C and C++, and
under what conditions
* can you have a memory leak in the program in a language with garbage collection ?
(Fall 2012 - Bodik and Necula):
What is a continuation? Pick any two flavors of "general" continuations
Problem 1:
This problem concerns the definition of an abstract interpretation
over the domain of intervals. The abstract interpretation analysis
will have to compute at each program point of an imperative program an
interval abstraction of the form [L .. H] (or, L <= x <= H) for each
variable in the program.
a) Define the abstract domain with all its aspects.
b) Define a concretization and an abstraction function. How does
the concretization function relate to the ordering relation in your
domain? What relationship exists between the concretization and the
abstraction function?
Consider the following simple imperative language:
e ::= n | x | e1 + e2 | e1 - e2
c ::= skip | x := e | c1; c2 | if e >= 0 then c1 else c2
c) Define the abstract interpretation for the conditional command.
State the soundness property.
d) Consider the program "if x - 2 >= 0 then x = x + 1 else x = 3".
What refinements do you need to make to your abstract interpretation
to be able to infer that x >= 3 at the end of that command.
Problem 2:
The second question asked about datarace detection.
(a) Define datarace.
(b) Is datarace a static or a dynamic property, and what does "static"
vs. "dynamic" means in the context of datarace detection?
(c) What is the relationship of the datarace to other nondeterministic
conditions in concurrent shared memory programs? Specifically, what
(harmless) nondeterminism does datarace intentionally not model? What
(harmful) nondeterminism conditions datarace fails to model?
(d) Name key program analysis techniques for datarace detection. Are
these techniques static or dynamic? Can they be either, ie, is the
datarace property orthogonal from whether we are analyzing a program
or an execution? (Also, what is an execution in a concurrent
program?)
(e) Define the lockset-based datarace detection technique. Can we
think of the lockset property as an abstract domain (as in abstract
interpretation)? If yes, what is the concrete semantics that the
lockset abstraction approximates? Does lockset have false positives?
If yes, give an example of a detected datarace that is not a real
datarace. Does lockset have false negatives? Again, give an example.
Finally, does dynamic lockset analysis generalize its analysis of a
single execution into properties of similar executions?
(f) Repeat (e) for happens-before detection.
(g) Implementation. Which of lockset / hb is more efficient to
implement? Can you suggest a strategy for increasing the efficiency
of the more expensive detector? What is the effect of your strategy
on false positives, false negatives?
(Spring 2012 - Bodik and Hilfinger):
What is a continuation? Pick any two flavors of "general" continuations
(ie not exceptions) and define their semantics. (These could be call-cc,
reset/shift, coroutines.) Show example programs using these continuations.
Variations of continuation constructs:
What is the rationale for the spectrum of continuations?
What program concept does a continuation reify? How is a continuation represented?
How many times can a continuation be invoked, and how does the choice
influence the implementation efficiency.
Give three examples using a continuation construct X to implement
language construct Y, eg exceptions with call/cc, iterators with
coroutines, regex with coroutines. Implement a lazy iterator for a
recursive tree traversal with a coroutine. Consider the impact of
semantic differences between Lua and Python coroutines/generators,
on programmers, and on language implementers.
(Fall 2011 - Necula and Sen):
Questions:
1. * Write the syntactic rules for typed lambda calculus with integers.
* Add exceptions to this language, with the following requirements:
** The exceptions have a name and carry a value
** Exceptions are first-class values
** The program can declare new exception names with static
scoping, like variable names
** The exception handlers can specify the name of the
exceptions to be handled
** Maintain type soundness
* Show the syntax of the extended language
* Show an example of an expression that uses all the new
constructs and explain informally how you type check and run the
expression. You can assume that you have basic types and
operators (integers, arithmetic, booleans, comparisons).
Compare this example with a similar one written in Java.
* Show the typing rules
* Show the operational semantics
2. * Describe static and dynamic scoping using examples.
* Explain the advantages and disadvantages of static and dynamic
Scoping both in terms of implementation and usability. Is is
possible to compile dynamically scoped languages efficiently?
(Spring 2011 - Hilfinger & Necula):
1. How might one use "reflection" in a programming language?
2. What kind of overheads can one expect from the use of reflection (as
compared to conventional accesses to fields and methods)? What might
a compiler/runtime implementor do to improve performance of reflection
mechanisms?
3. Consider a simple language with the following type system:
T ::= string | { f1 : T1, ..., fn : TN } | object
,where f1-fn are field names, object is the supertype of all other
types. Assume that T list is an abbreviation for the type { car : T, cdr : T list}.
Consider that the language runtime system provides the following reflection functions:
gfs : returns the list of field names for a value. The function
is not defined for string.
gf : given an object and a field name returns the value of the field.
isString : given an object returns true iff the object is of type string.
3a. Write the type of these functions.
3b. How should gf behave when passed a non existent field name?
3c. Write a pretty printing function using these functions.
3d. What exceptions may your function throw, according to a
standard type checker?
3e. How can you enrich the type system to enable the type checker
to verify the absence of exceptions in your function?"
(Fall 2010 - Bodik & Sen):
"The first question asked about dataraces and memory models:
1. What is a data race? Why multithreaded programs should avoid data races?
2. The following sequential program finds a minimum cost solution:
best = infinity
for (w in queue):
cost = compute_cost(w)
if cost < best:
best = cost
best_soln = w
Parallelize this code.
3.
best = infinity
for (w in queue):
if (lower_bnd(w) >= best):
continue
cost = compute_cost(w)
if cost < best:
best = cost
best_soln = w
Note 1. compute_cost is expensive call
Note 2. lower_bnd is inexpensive call
Note 3. lower_bnd(w) <= compute_cost(w)
Parallelize this code.
The second question asked about continuations. What is a continuation? What language constructs
can be encoded with (are special case of) continuations? What is continuation-passing style?
What are the uses and benefits of CPS? Transform a program into CPS form. Translate an exception
construct into lambda calculus in CPS form."
(Fall 2008 - Hilfinger & Sen):
"1. This question concerns the differences between static and dynamic
typing in programming languages.
a. Define what these mean. Illustrate the difference.
b. Methodology: what methodological advantages are claimed for
statically typed languages? For dynamically typed ones?
c. Formal semantics: Compare and contrast the formal semantic
definitions of statically and dynamically typed languages.
d. Implementation: The compiler presumably knows more about
statically typed programs. How can it use that information?
What analyses might it perform on dynamically typed programs to
get some of this information?
2. Assume that we have a simple program whose syntax is
Prog ::= L: START; S*
S ::= L: Var = C
S ::= L: Var = Var op Var
op ::= + | - | *
S ::= L: if (Var cmp Var) goto L
cmp ::= > | < | <= | >= | == | !=
S ::= L: ERROR
S ::= L: END
Assume we have a set of n predicates P1, P2, P3, ..., PN over program
variables.
Assume [P] is the set of program states that satisfy P.
What is [P1 \/ P2]?
What is [P1 /\ P2]?
What is [~P]?
What is P1 -> P2 in terms of [P1] and [P2]?
When do we say that a formula P1 is weaker than P2?
What is the weakest precondition of a formula P and a state S?
How do you compute the weakest precondition of a statement of the form
WP(P, x = e)?
How do you compute the weakest precondition of a statement of the form
WP(P, assume [x1 == x2])?
How do you compute the weakest precondition of a statement of the form
WP(P, S1; S2)?
What is symbolic path constraint of a program execution?
Given an execution path how can you compute the symbolic path
constraint using weakest pre condition computation?
What is strongest post condition?
Assume that the program states are abstracted by the set of predicates
P1, P2, ..., PN.
Given an abstract state F, how do you compute the state resulting from
the execution of S using weakest pre condition computation?
How can you use symbolic execution to compute strongest post condition?
Given a program, how can you compute a predicate abstraction? Is it
finite? What is the complexity of a predicate abstraction? How can
it be used to compute program correctness?"
(Spring 2008 - Bodik & Hilfinger):
"Question 1:
The question asked about automatic vectorization and array
dependence analysis. The question was based on the paper "Automatic
Translation of FORTRAN Programs to Vector Form". The subquestions
were as follows: What is vectorization? What is a vector machine?
Give a loop that is vectorizable and one that is NOT. Describe some
transformations that you perform before vectorization. What is the
goal of these transformation? What does it mean when we say that a
statement depends on itself? Formalize the notion of dependence for
the loop "do i= 1,N: X(f(i)) F(X(g(i)))". Express the condition
for existence of the dependence using equation f(x) - g(y) = 0
(dependence exits if solution lies in a particular x-y region.)
Derive the GCD dependence test. What are the limitations of the GCD
test? How can we improve on the GCD test by means of solving the
above equation in the domain of reals? Dependence graphs: what do they
express? Name the three kinds of dependences. Why do these
dependences must be enforced during loop transformations?
Question 2:
This was intended as an open-ended question to see how well candidates
understood some of the issues raised by parallel programming. The
parts below are the collective result of where the three students took
this question.
a. What makes parallel programming difficult?
b. What memory models accurately describe the behavior of actual
programs?
c. How might one deal with a weak memory-consistency models?
d. How does functional programming address the issues of parallel
programming?
e. What problems does one encounter when attempting to use functional
programming to write parallel programs?"
(Spring 2006 - Bodik & Graham):
"Question 1: This question presented a simple language extension to Java and
asked the student to develop static analysis for analyzing programs written
with this extension. Specifically, single-threaded Java was extended with
multi-threaded constructs for distributed memory computing (places,
async's), inspired by the X10 language. The goal was to develop analysis
for proving safety of run-time remote-access checks.
Question 2: The question asked the student to give a high-level outline of
the structure of an optimizer for an optimizing compiler and to explain the
respects in which an optimizer only approximates optimality. The student
was then asked to explain what kind of runtime information might inform
optimization, how that information might be collected, and how it might be
used."
(Fall 2005 - Hilfinger & Necula):
"1. Write the notation and the definition for Hoare triples for
partial and total correctness. Discuss how the definitions must be
changed for non deterministic languages. Show a few partial correctness
rules. Show the rule for "do c until b" and argue informally why is it
sound and complete. Show a total correctness rule for a looping
construct.
2. This question concerns a programming language and environment (compiler,
editor, project manager, etc.) with these features:
A. A program consists of a set of units (typically stored one per file).
Units are either 'specifications' or 'bodies' of packages (modules),
generic packages, or subprograms.
B. A specification of a subprogram is like a C function declaration (i.e.,
no body). A specification of package is like a C++ namespace,
and contains only type declarations, subprogram specifications,
variable declarations, and (nested) package specifications (possibly
generic). A specification of a generic package is just like that
of a package, but can be parameterized by types and values (incl.
subprograms), as in C++.
C. Each unit explicitly imports all subprograms and packages it
refers to (other than the standard prelude).
D. Bodies of packages contain bodies for all subprograms and nested packages
in the specification, plus any additional types, subprograms, (static)
variables, nested packages, and static initializers that are needed.
E. Specifications of subprograms may be marked "inline".
F. The programming environment for this language allows you to compile by
specifying a unit. As long as all SPECIFICATIONS mentioned by that
unit (recursively) are available, the compilation will succeed, performing
all static checks on the available units (and bodies, too, if available).
That is, if the compilation of each body succeeds in the absence of all
other bodies, the compilation will succeed when all bodies are present.
G. If all bodies are present as well, the system will also produce a
linked executable. Typically, one specifies the subprogram that is to
serve as the "main program", and the environment does whatever is
necessary.
H. The environment attempts to honor as many inline declarations as feasible.
I. The environment attempts to avoid recompiling anything that doesn't
need recompilation in order to speed up compilation.
In answering the questions, assume we understand ordinary C compilation,
and filter out anything that would be obvious to someone who has implemented
a C compiler. We've left out a lot of detail about the language; you may
assume the rest is a "typical" language.
Questions (very open-ended):
Q1: What language restrictions that we didn't mention must be true for
this design to be possible?
Q2: How would you implement the compiler so as to have the characteristics
described in F-I?"
(Spring 2005 - Bodik & Graham):
"Question 1
SSA Form: definition, properties, construction. Sub-questions follow.
-What does “SSA” stand for?
-What types of programs was SSA defined for? functional, imperative, OO
-Briefly, what is the key idea behind transforming a program into SSA form?
-Define SSA form, i.e., give a rule for when a program is in the SSA form?
-Construct SSA form for an example program
-Why does SSA simplify bookkeeping in optimizations
-How much can the program grow as a result of SSA transformation?
o show a pathological case
-Optimal SSA
o There is a commonly used notion of SSA optimality. What aspects of the form does it refer to?
-Naive SSA construction
-Applications, benefits of SSA
o SSA makes some analyses/transformations/optimizations more efficient, even asymptotically. Why?
o SSA makes some analyses more effective (i.e., more precise). Why?
-The dominance frontier algorithm.
Question 2
What are macros?
What are their purposes?
How are they implemented?
Macros are “outside” the programming language and often don’t use PL principles. How can the purposes
of macros be achieved by language redesign or augmentation? How do the language solutions differ from
the pre-processor solutions?
[If there is time] Many language designers like to have as much information as possible determined at
compile-time rather than runtime. Why is that desirable? Give some examples of language features and
trade-offs if the information can change at runtime."
(Fall 2004 - Hilfinger & Yelick):
"Q1: For the first question, we gave the student a brief paper summary of the
original (1976) Gries-Owicki axiomatization of parallel execution. And
asked the following questions:
1. Justify the axiom for cobegin ... coend in the absence of shared
regions ('with' statements in their syntax).
2. Justify the axiom for 'with' itself.
3. What, if any, relaxation on the conditions in the axioms is possible
if we assume that the processes in a cobegin coend are executed
sequentially in an non-deterministic order rather than in parallel?
4. How would you formulate the actual behavior of global-variable
reference within their model (esp. taking into account memory
consistency models)?
Q2: The second question was about the following code fragment:
for i = 1 to 3
for j = 1 to 4
b[j,i] = a[j,i-1] + a[j,i+1]
a[j,i] = a[j,i] + b[j,i]
We first asked the student to draw a picture of the iteration space for this
loop, which is did without trouble. We then asked about possible optimizations
and what types of checks would a compiler have to do to see of those transformations
were legal. A Pentium 4 architecture was used as the example."
(Spring 2004 - Bodik & Graham):
"Q1: The first question explored the structure of a compiler, then a source-to-source
compiler, then a C++ to Java translator.
Q2: The second question asked about the varieties of garbage collectors, then about
the implementation of stop-and-copy collectors, and finally about incremental
stop-and-copy."
(Fall 2003 - Bodik & Necula):
"Q1: The first problem was concerned with axiomatic semantics. Students had
to define the notions of partial and total correctness, based on the
operational semantics, for a simple imperative language. Then the students
had to write the derivation rules for the partial correctness assertions.
The last part of the question was concerned with the relationship between
weakest preconditions and partial correctness assertions.
Q2: The second question asked the student to develop static analysis for verifying
(model checking) the lock usage in a simple imperative program (with no pointers
and no procedures). First, the question sought to determine whether flow-sensitive
or flow-insensitive class of analyses would be more suitable for this verification
task, focusing on differences of the two styles of analyses and inherent limitations
of the weaker of the two analyses. Next, the question asked to develop dataflow
analysis for this verification problem. The focus was on making the analysis as
efficient as possible, and on identifying "symmetry" with a similar error checking
problem. Finally, several questions about the role of monotonicity and distributivity
in dataflow analysis were asked."
(Fall 2002 - Necula & Yelick):
"Q1: Subtyping in lambda calculus
Q2: After being presented with a short fragment of code performing a matrix
operation, discuss various optimizations that could be performed on that code."
(Spring 2002 - Graham & Hilfinger):
"Q1: Semantics
Q2: Optimization"
(Fall 2001 - Aiken & Necula):
"1. Parametric polymorphism. Syntax and typing rules of polymorphic
lambda calculus. Kinds of parametric polymorphism. Let polymorphism.
2. Graph-coloring register allocation. The problem to the solved,
computing the register intereference graph, the coloring heuristic,
what to do when the heuristic fails, register spilling."
(Fall 2000 - Yelick & Necula):
"Q1: The first question was about the semantics of a simple object oriented
language.
Consider the language of expressions:
e ::= x | new T | if e1 then e2 else e3 | while e1 do e2
x := e | let x:T = e1 in e2 | e0.m(e1_,...,en)
In the above language, all variables are introduced by let. They are
statically scoped and the scope is the body of the let (e2). The
language of types consists of a set of class identifiers along with
a single inheritance hierarchy. There is a distinguished class called
Bool.
a) Define a typing judgment and write typing rules for this language
taking into consideration subtyping originating from subclassing in the
class hierarchy. How do the typing rules change if we have multiple
inheritance?
b) Define an evaluation judgment and write the evaluation rules.
Consider that the set of values consists of elements of the form Obj(T)
c) If you have Gamma |- e : T and (Sigma, e) -> (Sigma', Obj(T')) what
relationship is there between the two judgments and under what
conditions.
Q2: This question is about implementation of object-oriented languages.
Consider the following Java program fragment:
In Java:
class X {
public int val;
public int get () { return val; }
public int set (int y) { val = y; }
}
Part 1)
Draw a picture of the runtime structures that would exist after
the following statements have executed:
X x1 = new X();
X x2 = new X();
Part 2) Inheritance
Now assume there is an additional class:
class B extends A {
set (int z) { ... }
print () { ... }
}
How would the picture change after the creation of a B object?
B b = new B();
Part 3) Method Inlining
For Java: If objects are small, this is quite expensive, even to
access the data fields. There are two sources of inefficiency: the method
call overhead and the object indirection. Under what circumstances
can a method be inlined?
Part 4) Object Inlining
Under what circumstances can the extra level
of indirection in the object representation be eliminated,
i.e., when is it a safe transformation for the compiler
to take an object (represented as a pointer to a record)
and store the record in place of the pointer?
Part 5) Distribution
Imagine you now want to implement a system in which you can
access objects that are on other machines. In particular,
say you have an object on some processor P2 and you want to
call methods on it from P1. What kind of code and runtime
structures would you need on your own node to make this work?"
(Spring 2000 - Aiken & Yelick):
"Q1: Regarding the design and implementation of various garbage
collection schemes;
Q2: Regarding efficient compilation of a subset of MATLAB."
(Fall 1999 - Necula & Hilfinger):
"Q1: Given this program fragment:
x < -1
L: y < -x
x < -x + 1
if P(x) jmp L
return x
- translate to SSA;
- optimize the SSA; and
- translate back.
Q2: Given a language with templates, write some typing rules, and
explain some of the issues templates raise in implementing a compiler."
August 2000