Programming Languages Questions


Security Prelim Fall 2019 - David Wagner and Raluca Ada Popa

1 Web security
Alice has a banking service and wants to start a web hosting service for clients to host
their websites.

	1. She is trying to decide between placing the web-hosting service at{CLIENTSITE}
	versus{CLIENTSITE}. Which option is safer and why?

	2. Alice¿s banking website contains PHP code of the form:

	echo "You searched for: $SEARCHTERM";

	The first line obtains the query field from the URL.
		(a) What attack is this script vulnerable to? What could the attacker do for example?
		(b) How can you fix such an attack?

2 Securing a publish-subscribe system

In a publish-subscribe system, publishers publish messages on a channel, and subscribers subscribe to channels of interest to receive these messages.
A router knows which subscribers are subscribed to each channel and routes messages it receives from the publishers for that channel to the subscribers.
For example, publishers P1 and P2 publish messages to the channel "Soda Hall", and subscribers S1, S2 and S3 subscribe to this channel.
The router will send every message it receives from P1 and P2 for the channel "Soda Hall" to S1, S2, and S3.

The goals are:
	- Even if a router is compromised, only authorized publishers should be able to send messages to a channel and only authorized subscribers
	  should be able to read messages on a channel.

	- Publishers and subscribers to a channel are decoupled: they should not need to know each other, or how many they are.

	1. Design a cryptographic protocol that achieves this desired security. You can assume that there is a trusted authority for each channel,
	   but this authority should not be handling messages.

	2. Explain how to add a publisher to a channel, and then how to add a subscriber to a channel.

	3. Explain how to revoke a publisher and then a subscriber from the channel.

3 Securing car systems

A modern car has dozens or hundreds of embedded microprocessors for controlling different sensors and
actuators in the car. Consider a car that has a microcontroller for cruise control, one for the tire speed sensor,
and one for the radio.

	1. The manufacturer wants to add a feature where the radio automatically turns up the volume when the
	   car is going above a certain speed. Does this pose any security risks? Why or why not?

	2. Propose a design you could use to implement this feature securely.

4 Securing a mail server

Your team has been hired to build a mail server for classified email.

	1. After Snowden, the military is worried about the potential that an authorized but malicious user could
	   find a vulnerability in your code and gain access to other people¿s emails. Describe a design to defend
  	  against this threat.

	2. You realize that you need to support server-side search, where the user supplies a search phrase and
	   the server finds all matching emails. You found a great library online that implements this, and you¿d
	   love to use it, but you¿re not sure who wrote it. Is there a secure way to use it? Justify your answer.

        3. A spy reports that the enemy has gained the credentials of a few legitimate users and is planning to
	   log into their account and download all of their archived email, but you¿re not sure which users are
	  affected. Describe how you could defend against this threat.

Fall 2018 - Koushik Sen and Alvin Cheung
Q1. This question is about liveness analysis.
1. When do we say that a variable is live at a statement s? 
2. How liveness analysis can be used to eliminate dead code?
3. Describe the transfer functions for liveness analysis.  For this question assume that for each statement s, we compute the following information about the value of x immediately before and after s. 

Lin(x,s) = liveness value of x before s 
Lout(x,s) = liveness value of x after s 
Liveness values can be true or false. 

4. Describe an algorithm for liveness analysis.
5. Show an example where the algorithm is not precise.
6. How do you fix the analysis? 

Q2. This problem is about program dependence graphs (PDGs) and their applications. 
(a) Define what a PDG is. 
(b) What types of data dependencies are captured in PDGs? 
(c) Draw the PDG for a sample program. 
(d) What are the different types of data dependencies? Which of them are captured in the PDG for the sample program? 
(e) PDGs can be used for program slicing. Explain what that means and compute the static slice of the program above with respect to the program point i := i+1 
(f) Suppose we add exception handling to the language. How would you model that in the PDG? 

(Fall 2015 - Necula  and Sen):

Q1: Axiomatic semantics for a simple imperative language

   * write the syntax for an imperative language with integer and boolean 
     expressions, and assignment, conditionals, while loops

   * write the syntax for the partial and total correctness Hoare triples. 

   * write the syntax for precondition and postcondition predicates

   * write the formal definition of the total and partial correctness, using
      either denotational or operational semantics

   * write the axiomatic semantics rules for the statements in the language

* Q2: automatic memory management/garbage collection

    * enumerate some algorithms for automatic memory management

    * discuss reference counting, what it is, how it is done, what drawbacks it has

    * discuss mark-and-sweep, what it is, how it is done

    * what auxiliary data structure may be needed for the mark phase, how much memory would 
      this take, and how can you do it while using a constant amount of additional memory

    * discuss copying garbage collection, what it is, how it is done,

    * which of mark-and-sweep and copying garbage collection can be used for C and C++, and 
       under what conditions

    * can you have a memory leak in the program in a language with garbage collection ?

(Fall 2012 - Bodik  and Necula):

What is a continuation?  Pick any two flavors of "general" continuations

Problem 1:

This problem concerns the definition of an abstract interpretation
over the domain of intervals. The abstract interpretation analysis
will have to compute at each program point of an imperative program an
interval abstraction of the form [L .. H] (or, L <= x <= H) for each
variable in the program.

a) Define the abstract domain with all its aspects.

b) Define a concretization and an abstraction function. How does
the concretization function relate to the ordering relation in your
domain? What relationship exists between the concretization and the
abstraction function?

Consider the following simple imperative language:

       e ::= n | x | e1 + e2 | e1 - e2
       c ::= skip | x := e | c1; c2 | if e >= 0 then c1 else c2

c) Define the abstract interpretation for the conditional command.
State the soundness property.

d) Consider the program "if x - 2 >= 0 then x = x + 1 else x = 3".
What refinements do you need to make to your abstract interpretation
to be able to infer that x >= 3 at the end of that command.

Problem 2:

The second question asked about datarace detection.

(a) Define datarace.

(b) Is datarace a static or a dynamic property, and what does "static"
vs. "dynamic" means in the context of datarace detection?

(c) What is the relationship of the datarace to other nondeterministic
conditions in concurrent shared memory programs?  Specifically, what
(harmless) nondeterminism does datarace intentionally not model? What
(harmful) nondeterminism conditions datarace fails to model?

(d) Name key program analysis techniques for datarace detection.  Are
these techniques static or dynamic?  Can they be either, ie, is the
datarace property orthogonal from whether we are analyzing a program
or an execution?  (Also, what is an execution in a concurrent

(e) Define the lockset-based datarace detection technique.  Can we
think of  the lockset property as an abstract domain (as in abstract
interpretation)?  If yes, what is the concrete semantics that the
lockset abstraction approximates?  Does lockset have false positives?
If yes, give an example of a detected datarace that is not a real
datarace.  Does lockset have false negatives?  Again, give an example.
Finally, does dynamic lockset analysis generalize its analysis of a
single execution into properties of similar executions?

(f) Repeat (e) for happens-before detection.

(g) Implementation.  Which of lockset / hb is more efficient to
implement? Can you suggest  a strategy for increasing the efficiency
of the more expensive detector?  What is the effect of your strategy
on false positives, false negatives?

(Spring 2012 - Bodik  and Hilfinger):

What is a continuation?  Pick any two flavors of "general" continuations
(ie not exceptions) and define their semantics.  (These could be call-cc,
reset/shift, coroutines.)  Show example programs using these continuations.  

Variations of continuation constructs: 

What is the rationale for the spectrum of continuations?  

What program concept does a continuation reify?  How is a continuation represented?  

How many times can a continuation be invoked, and how does the choice
influence the implementation efficiency.  

Give three examples using a continuation construct X to implement
language construct Y, eg exceptions with call/cc, iterators with
coroutines, regex with coroutines. Implement a lazy iterator for a
recursive tree traversal with a coroutine.  Consider the impact of
semantic differences between Lua and Python coroutines/generators,
on programmers, and on language implementers.

(Fall 2011 - Necula and Sen):


1.  * Write the syntactic rules for typed lambda calculus with integers.
    * Add exceptions to this language, with the following requirements:
           ** The exceptions have a name and carry a value
           ** Exceptions are first-class values
           ** The program can declare new exception names with static
              scoping, like variable names
           ** The exception handlers can specify the name of the
              exceptions to be handled
           ** Maintain type soundness

      * Show the syntax of the extended language

      * Show an example of an expression that uses all the new
        constructs and explain informally how you type check and run the
        expression. You can assume that you have basic types and
        operators (integers, arithmetic, booleans, comparisons).

	Compare this example with a similar one written in Java.

       * Show the typing rules

       * Show the operational semantics

2. * Describe static and dynamic scoping using examples.
   * Explain the advantages and disadvantages of static and dynamic  
     Scoping both in terms of implementation and usability.  Is is 
     possible to compile dynamically scoped languages efficiently?

(Spring 2011 - Hilfinger & Necula):
1. How might one use "reflection" in a programming language?

2. What kind of overheads can one expect from the use of reflection (as
  compared to conventional accesses to fields and methods)?  What might
  a compiler/runtime implementor do to improve performance of reflection

3. Consider a simple language with the following type system:

  T ::= string | { f1 : T1, ..., fn : TN } | object

   ,where f1-fn are field names, object is the supertype of all other
    types. Assume that T list is an abbreviation for the type { car : T, cdr : T list}.
    Consider that the language runtime system provides the following reflection functions:

     gfs : returns the list of field names for a value. The function
     is not defined for string.
     gf : given an object and a field name returns the value of the field.
     isString : given an object returns true iff the object is of type string.

    3a. Write the type of these functions.

    3b. How should gf behave when passed a non existent field name?

    3c. Write a pretty printing function using these functions.

    3d. What exceptions may your function throw, according to a
        standard type checker?

    3e. How can you enrich the type system to enable the type checker
        to verify the absence of exceptions in your function?"

(Fall 2010 - Bodik & Sen):
"The first question asked about dataraces and memory models:

1. What is a data race?  Why multithreaded programs should avoid data races?

2. The following sequential program finds a minimum cost solution:

best = infinity
for (w in queue):
  cost = compute_cost(w)  
  if cost < best:
      best = cost
      best_soln = w

Parallelize this code.


best = infinity
for (w in queue):
  if (lower_bnd(w) >= best):  
  cost = compute_cost(w) 
  if cost < best:
      best = cost
      best_soln = w

Note 1. compute_cost is expensive call
Note 2. lower_bnd is inexpensive call
Note 3. lower_bnd(w) <= compute_cost(w)

Parallelize this code.

The second question asked about continuations.  What is a continuation? What language constructs 
can be encoded with (are special case of) continuations? What is continuation-passing style? 
What are the uses and benefits of CPS?  Transform a program into CPS form.  Translate an exception 
construct into lambda calculus in CPS form."

(Fall 2008 - Hilfinger & Sen):
"1. This question concerns the differences between static and dynamic
   typing in programming languages.

   a. Define what these mean.   Illustrate the difference.
   b. Methodology: what methodological advantages are claimed for 
      statically typed languages?  For dynamically typed ones?  
   c. Formal semantics:  Compare and contrast the formal semantic
      definitions of statically and dynamically typed languages.
   d. Implementation:  The compiler presumably knows more about
      statically typed programs.  How can it use that information?
      What analyses might it perform on dynamically typed programs to
      get some of this information?

2. Assume that we have a simple program whose syntax is

    Prog ::= L: START; S*

    S ::= L: Var = C

    S ::= L: Var = Var op Var

    op ::= + | - | *

    S ::= L: if (Var cmp Var) goto L

    cmp ::= > | < | <= | >= | == | !=

    S ::= L: ERROR

    S ::= L: END

Assume we have a set of n predicates P1, P2, P3, ..., PN over program 

Assume [P] is the set of program states that satisfy P.

    What is [P1 \/ P2]?

    What is [P1 /\ P2]?

    What is [~P]?

    What is P1 -> P2 in terms of [P1] and [P2]?

    When do we say that a formula P1 is weaker than P2?

    What is the weakest precondition of a formula P and a state S?

    How do you compute the weakest precondition of a statement of the form 
    WP(P, x = e)?

    How do you compute the weakest precondition of a statement of the form 
    WP(P, assume [x1 == x2])?

    How do you compute the weakest precondition of a statement of the form 
    WP(P, S1; S2)?

    What is symbolic path constraint of a program execution?

    Given an execution path how can you compute the symbolic path 
    constraint using weakest pre condition computation?

    What is strongest post condition?

Assume that the program states are abstracted by the set of predicates 

P1, P2, ..., PN.

Given an abstract state F, how do you compute the state resulting from
the execution of S using weakest pre condition computation?

How can you use symbolic execution to compute strongest post condition?

Given a program, how can you compute a predicate abstraction?  Is it 
finite?  What is the complexity of a predicate abstraction?  How can 
it be used to compute program correctness?"

(Spring 2008 - Bodik & Hilfinger):
"Question 1:  

The question asked about automatic vectorization and array
dependence analysis.  The question was based on the paper "Automatic
Translation of FORTRAN Programs to Vector Form".  The subquestions
were as follows: What is vectorization? What is a vector machine?
Give a loop that is vectorizable and one that is NOT.  Describe some
transformations that you perform before vectorization.  What is the
goal of these transformation?  What does it mean when we say that a
statement depends on itself?  Formalize the notion of dependence for
the loop "do i= 1,N: X(f(i)) F(X(g(i)))".  Express the condition
for existence of the dependence using equation f(x) - g(y) = 0
(dependence exits if solution lies in a particular x-y region.)
Derive the GCD dependence test.  What are the limitations of the GCD
test?  How can we improve on the GCD test by means of solving the
above equation in the domain of reals? Dependence graphs: what do they
express?  Name the three kinds of dependences.  Why do these
dependences must be enforced during loop transformations?

Question 2: 

This was intended as an open-ended question to see how well candidates
understood some of the issues raised by parallel programming.  The
parts below are the collective result of where the three students took
this question.

   a. What makes parallel programming difficult?
   b. What memory models accurately describe the behavior of actual
   c. How might one deal with a weak memory-consistency models?
   d. How does functional programming address the issues of parallel 
   e. What problems does one encounter when attempting to use functional
      programming to write parallel programs?"

(Spring 2006 - Bodik & Graham):
"Question 1:  This question presented a simple language extension to Java and
asked the student to develop static analysis for analyzing programs written
with this extension.  Specifically, single-threaded Java was extended with
multi-threaded constructs for distributed memory computing (places,
async's), inspired by the X10 language.  The goal was to develop analysis
for proving safety of run-time remote-access checks.  

Question 2: The question asked the student to give a high-level outline of
the structure of an optimizer for an optimizing compiler and to explain the
respects in which an optimizer only approximates optimality.  The student
was then asked to explain what kind of runtime information might inform
optimization, how that information might be collected, and how it might be

(Fall 2005 - Hilfinger & Necula):
"1. Write the notation and the definition for Hoare triples for
   partial and total correctness. Discuss how the definitions must be
   changed for non deterministic languages. Show a few partial correctness
   rules. Show the rule for "do c until b" and argue informally why is it
   sound and complete. Show a total correctness rule for a looping

2. This question concerns a programming language and environment (compiler,
   editor, project manager, etc.)  with these features:

   A. A program consists of a set of units (typically stored one per file).
      Units are either 'specifications' or 'bodies' of packages (modules),
      generic packages, or subprograms.

   B. A specification of a subprogram is like a C function declaration (i.e.,
      no body).  A specification of package is like a C++ namespace, 
      and contains only type declarations, subprogram specifications, 
      variable declarations, and (nested) package specifications (possibly
      generic).  A specification of a generic package is just like that 
      of a package, but can be parameterized by types and values (incl. 
      subprograms), as in C++.

   C. Each unit explicitly imports all subprograms and packages it
      refers to (other than the standard prelude).

   D. Bodies of packages contain bodies for all subprograms and nested packages
      in the specification, plus any additional types, subprograms, (static)
      variables, nested packages, and static initializers that are needed.

   E. Specifications of subprograms may be marked "inline".

   F. The programming environment for this language allows you to compile by
      specifying a unit.  As long as all SPECIFICATIONS mentioned by that
      unit (recursively) are available, the compilation will succeed, performing
      all static checks on the available units (and bodies, too, if available).
      That is, if the compilation of each body succeeds in the absence of all
      other bodies, the compilation will succeed when all bodies are present.

   G. If all bodies are present as well, the system will also produce a
      linked executable.  Typically, one specifies the subprogram that is to
      serve as the "main program", and the environment does whatever is

   H. The environment attempts to honor as many inline declarations as feasible.

   I. The environment attempts to avoid recompiling anything that doesn't
      need recompilation in order to speed up compilation.

   In answering the questions, assume we understand ordinary C compilation,
   and filter out anything that would be obvious to someone who has implemented
   a C compiler.  We've left out a lot of detail about the language; you may
   assume the rest is a "typical" language.

   Questions (very open-ended): 

   Q1: What language restrictions that we didn't mention must be true for
       this design to be possible?

   Q2: How would you implement the compiler so as to have the characteristics
       described in F-I?"

(Spring 2005 - Bodik & Graham):
"Question 1
SSA Form: definition, properties, construction.  Sub-questions follow.

-What does “SSA” stand for?
-What types of programs was SSA defined for? functional, imperative, OO
-Briefly, what is the key idea behind transforming a program into SSA form?
-Define SSA form, i.e., give a rule for when a program is in the SSA form?
-Construct SSA form for an example program 
-Why does SSA simplify bookkeeping in optimizations
-How much can the program grow as a result of SSA transformation?
o	show a pathological case
-Optimal SSA
o	There is a commonly used notion of SSA optimality.  What aspects of the form does it refer to?
-Naive SSA construction
-Applications, benefits of SSA
o	SSA makes some analyses/transformations/optimizations more efficient, even asymptotically.  Why?
o	SSA makes some analyses more effective (i.e., more precise).  Why?
-The dominance frontier algorithm.
Question 2
What are macros?
What are their purposes?
How are they implemented?

Macros are “outside” the programming language and often don’t use PL principles.  How can the purposes 
of macros be achieved by language redesign or augmentation?  How do the language solutions differ from 
the pre-processor solutions?

[If there is time] Many language designers like to have as much information as possible determined at 
compile-time rather than runtime.  Why is that desirable?  Give some examples of language features and 
trade-offs if the information can change at runtime."   

(Fall 2004 - Hilfinger & Yelick):
"Q1: For the first question, we gave the student a brief paper summary of the 
original (1976) Gries-Owicki axiomatization of parallel execution.  And
asked the following questions:

  1. Justify the axiom for cobegin ... coend in the absence of shared 
     regions ('with' statements in their syntax).
  2. Justify the axiom for 'with' itself.
  3. What, if any, relaxation on the conditions in the axioms is possible
     if we assume that the processes in a cobegin coend are executed 
     sequentially in an non-deterministic order rather than in parallel?
  4. How would you formulate the actual behavior of global-variable
     reference within their model (esp. taking into account memory
     consistency models)?   

Q2:  The second question was about the following code fragment:

       for i = 1 to 3 
          for j = 1 to 4
            b[j,i] = a[j,i-1] + a[j,i+1]
            a[j,i] = a[j,i] + b[j,i]

We first asked the student to draw a picture of the iteration space for this 
loop, which is did without trouble.  We then asked about possible optimizations 
and what types of checks would a compiler have to do to see of those transformations
were legal.  A Pentium 4 architecture was used as the example."   

(Spring 2004 - Bodik & Graham):
"Q1: 	The first question explored the structure of a compiler, then a source-to-source 
	compiler, then a C++ to Java translator.   

Q2: 	The second question asked about the varieties of  garbage collectors, then about 
	the implementation of stop-and-copy collectors, and finally about incremental 

(Fall 2003 - Bodik & Necula):
"Q1: 	The first problem was concerned with axiomatic semantics. Students had 
	to define the notions of partial and total correctness, based on the 
	operational semantics, for a simple imperative language. Then the students 
	had to write the derivation rules for the partial correctness assertions. 
	The last part of the question was concerned with the relationship between 
	weakest preconditions and partial correctness assertions. 

Q2: 	The second question asked the student to develop static analysis for verifying 
	(model checking) the lock usage in a simple imperative program (with no pointers 
	and no procedures).  First, the question sought to determine whether flow-sensitive 
	or flow-insensitive class of analyses would be more suitable for this verification 
	task, focusing on differences of the two styles of analyses and inherent limitations 
	of the weaker of the two analyses.  Next, the question asked to develop dataflow 
	analysis for this verification problem.  The focus was on making the analysis as 
	efficient as possible, and on identifying "symmetry" with a similar error checking 
	problem.  Finally, several questions about the role of monotonicity and distributivity 
	in dataflow analysis were asked."   

(Fall 2002 - Necula & Yelick):
"Q1: Subtyping in lambda calculus
 Q2: After being presented with a short fragment of code performing a matrix 
     operation, discuss various optimizations that could be performed on that code."   

(Spring 2002 - Graham & Hilfinger):
"Q1: Semantics
 Q2: Optimization"

(Fall 2001 - Aiken & Necula):
"1. Parametric polymorphism. Syntax and typing rules of polymorphic
lambda calculus. Kinds of parametric polymorphism. Let polymorphism.

2. Graph-coloring register allocation. The problem to the solved,
computing the register intereference graph, the coloring heuristic,
what to do when the heuristic fails, register spilling."

(Fall 2000 - Yelick & Necula):
"Q1: The first question was about the semantics of a simple object oriented

 Consider the language of expressions:

 e ::= x | new T | if e1 then e2 else e3 | while e1 do e2
       x := e | let x:T = e1 in e2 | e0.m(e1_,...,en)

 In the above language, all variables are introduced by let. They are
statically scoped and the scope is the body of the let (e2). The
language of types consists of a set of class identifiers along with 
a single inheritance hierarchy. There is a distinguished class called

   a) Define a typing judgment and write typing rules for this language
taking into consideration subtyping originating from subclassing in the 
class hierarchy. How do the typing rules change if we have multiple

   b) Define an evaluation judgment and write the evaluation rules.
Consider that the set of values consists of elements of the form Obj(T)

   c) If you have Gamma |- e : T and (Sigma, e) -> (Sigma', Obj(T')) what
relationship is there between the two judgments and under what

Q2: This question is about implementation of object-oriented languages.
Consider the following Java program fragment:

In Java:
  class X {              
    public int val;         
    public int get () { return val; }
    public int set (int y) { val = y; }

Part 1) 
Draw a picture of the runtime structures that would exist after
the following statements have executed:

  X x1 = new X();
  X x2 = new X();

Part 2) Inheritance
  Now assume there is an additional class:

  class B extends A {
    set (int z) { ... }
    print () { ... }

 How would the picture change after the creation of a B object?

  B b = new B();

Part 3) Method Inlining
  For Java: If objects are small, this is quite expensive, even to
  access the data fields.  There are two sources of inefficiency: the method
  call overhead and the object indirection.  Under what circumstances 
  can a method be inlined?

Part 4) Object Inlining
  Under what circumstances can the extra level
  of indirection in the object representation be eliminated,
  i.e., when is it a safe transformation for the compiler
  to take an object (represented as a pointer to a record)
  and store the record in place of the pointer?

Part 5) Distribution
  Imagine you now want to implement a system in which you can 
  access objects that are on other machines.  In particular, 
  say you have an object on some processor P2 and you want to 
  call methods on it from P1.  What kind of code and runtime 
  structures would you need on your own node to make this work?"

(Spring 2000 - Aiken & Yelick):
"Q1: Regarding the design and implementation of various garbage
collection schemes;

Q2: Regarding efficient compilation of a subset of MATLAB."

(Fall 1999 - Necula & Hilfinger):
"Q1: Given this program fragment:
	x < -1
	L: y < -x
	x < -x + 1
	if P(x) jmp L
	return x
- translate to SSA;
- optimize the SSA; and 
- translate back.

Q2: Given a language with templates, write some typing rules, and
explain some of the issues templates raise in implementing a compiler."

August 2000