Database Management Questions


(Spring 2013 - Hellerstein & Brewer):

Questions reformatted without comment on the student:

Question 1 - Transactional Recovery - We began by asking for the basic
definition of ACID transactions.  We then asked about achieving atomicity
and performance with and without the use of a log, and in the case of a
log-only implementation.  The question shifted then to the use of Log
Sequence Numbers for transactional recovery -- their semantic role, and
the way that they tie together database state with log history.  We
asked about potential problems with the use of LSNs in the log and
PageLSNs in the database.

Question 2 - Materialized View Management - In this question, we
focused on the notions of view maintenance and/or invalidation, and
the way that those tasks might interact with interactive update queries. 
We explored the challenge of keeping update transactions efficient while
still ensuring that potential view readers would be transactionally isolated
from the effects of these updates on materialized views. 

(Fall 2012 - Hellerstein & Franklin):

Questions - Pick 2 out of 3

Query Optimization - In this question we first asked for a basic description
of the goals of the traditional System R relational query optimizer, with an
emphasis on the cost model used to evaluate alternative query plans and the
implications of that cost model.   We then asked the student to devise an
optimization scheme for a different objective, namely reducing the latency
to getting the first tuple of an answer (rather than the total cost of
obtaining the entire answer).   Finally we discussed how to improve time
to first answer in a parallel system, where data is partitioned across
multiple nodes.

B+Tree Indexing - Explain the basic B+Tree structure, including the design
goals for the structure, its salient characteristics and some of the key
algorithms for maintaining such structures.   The second part of the
question investigated the implication of various memory assumptions,
storage architecture tradeoffs and storage device properties on the
utility of the basic data structure and how that structure might change
in response. 

Materialized Views - In this question, we discussed basic tradeoffs around
the concept of materialized views in modern database systems.  What are
the advantages and when can they be achieved?  What are costs and how can
they be mitigated?   Then discuss  techniques for determining which views
to materialize in cube-based on-line analytical processing (OLAP) systems,
based on some of the readings in the reading list.  

(Fall 2010 - Brewer & Franklin):
"Topic 1: Database Recovery
Topic 2: Parallel Database systems" 

(Spring 2009 - Franklin & Hellerstein):
"Question 1: Describe the Postgres no-overwrite storage manager.  Reflect on technology 
changes in the years since the paper was written.  Choosing a few such changes to focus 
on, describe how you might implement no-overwrite storage today.

Question 2: Describe the notion of "interesting orders" in the context of cost-based query 
optimization.  In the absence of sort-based join algorithms, are there other uses for this 
concept?  E.g. in the context of hash-based algorithms?  Is there a more general notion at 
work in this setting? " 

(Fall 2007 - Franklin & Hellerstein):
"Question 1: DB Architecture: Describe the steps involved in bootstrapping a DBMS.

Question 2: Semi-structured Data: Overview the motivation and definition of semi-structured 
data models.  Discuss ramifications on data modeling as well as on storage and retrieval." 

(Spring 2007 - Franklin & Hellerstein):
"Question 1: Transaction Management.  Give a crisp definition of a transaction.  
Provide a simple but inefficient mechanism for implementing this definition.  
Enhance this mechanism to improve concurrency -- why is concurrency important?

Question 2: Objects and Databases.  Highlight two main architectural thrusts for 
incorporating object management into databases.  Critique each approach from the 
perspective of the other."

(Fall 2006 - Franklin & Hellerstein):
"1) Describe protocols optimistic concurrency control.  Relate their 
performance to blocking protocols for concurrency control.  Discuss 
the challenges of maintaining consistency in the fact of concurrent 
access in distributed and disconnected environments.

2) Give an overview of the data mining tasks of clustering and classification. 
Describe the basic data structure in the BIRCH clustering technique, and ways 
in which that structure is tuned during the course of the BIRCH algorithms."

(Spring 2005 - Franklin & Hellerstein):
"1) Describe a variant of the System R dynamic programming optimizer that 
produces the optimal plan subject to the constraint that only B buffers 
of memory are to be used.

2) 2) Review the assigned Lehman/Yao B-Link scheme for concurrent B-trees.  
Design an efficient logging and recovery technique that integrates with 
the B-Link scheme."

(Fall 2004 - Franklin & Hellerstein):
"Question 1: Explain the goal of Optimistic Concurrency Control (what
semantics does it provide);  Describe the Kung and Robinson Algorithm.
What are the true tradeoffs between K&R OCC and locking, despite what
the K&R paper says?.  Assume that transactions arrive with an indication
of whether or not they are "read-only" and that 90% of the transactions
are marked as such.  How could you exploit this in a single-node system?
What about a cluster?

Question 2: Explain the concept of association rules and the  "a priori"
algorithm mining them from a transaction database; Explain the concept
of data cubes and their efficient computation; Describe the underlying
connections between the two and how this could be exploited to improve
data cube computation."

(Spring 2003 - Hellerstein & Rowe):
"1) Transactional storage: We asked the student to describe on-disk data 
structures for storing a relation as a heap file.  We then asked the 
student to consider concurrency and recovery challenges in managing a 
database system which only manages heap files (no indexes).  This 
exercised their understanding of locking and lock granularities, as 
well as logging and recovery for both data and structure modification 

2) Query processor architecture and code overhead: We asked the student 
to describe the code-path for processing a query in a DBMS.  We asked 
them to characterize the I/O and CPU overheads of each module in the 
code, and discuss the benefits of placing logic at different levels of the system."

(Fall 2002 - Franklin & Rowe):
 - Design of a database to support a television news operation that wished to maintain
   an archive of their broadcast stories.  
 - Storage systems capable of storing multiple, historical versions of data items.
 - Design of a database system for a package delivery service.
 - Computation of aggregate queries in an on-line analytical processing (OLAP) setting.

(Spring 2002 - Hellerstein & Franklin):
"- DBMIN buffer manager.
 - Design alternatives for laying out tuples on disk pages and their
   impact on CPU and processor cache behavior.
 - Query optimization > advanced techniques for optimizing queries with
   duplicate elimination and aggregation.
 - Concurrency control for B+Trees on both the physical level and
   logical level."

(Fall 2001 - Hellerstein & Franklin):
"Q1: - Definition of Transactions, Serializability, Conflict
vs. idealized serializability, two-phase locking, phantom problem.
- Describe the steps a database system takes to boot 
from scratch.

Q2: - Main memory databases and the need for concurrency.
- Recovery and logical vs. physical operations.

Q3: - Parallel query processing algorithms.
- Dangers of large scale replication.

Q4: - I/O behavior of sorting and hashing.
- Association Rule mining - definition and algorithm."

(Fall 2000 - Franklin & Hellerstein):
"Q1: Concurrency Control: Strict vs. Non-strict locking, recovery
implications, and specialized locking for index structures.

Q2: Database Design and Query Processing for Text Processing

Q3: Recovery - Nested top level actions"

(Spring 2000 - Franklin & Hellerstein):
"Q1: Describe the degrees of consistency in a transactional basis from a
lock-based implementation perspective. Characterize the semantic
differences offered by the different degrees."

Q2: What are the time and resource costs of different hash join
algorithms? Characterize some main differences between the pipelined
hash join algorithm and a more traditional hash join algorithm of your

Q3: What is the motivation of a subtle details of the ARIES recovery
protocol called 'nested top actions?'"

(Fall 1999 - Franklin & Hellerstein):
"Q1: Describe 'Degrees of Consistency' and discuss locking and
optimistic techniques and how they relate to phantoms. 

Q2: Discuss Semi-Joins, and how they relate to precomputation and
caching of expensive operations. Discuss join strategies for Distributed
and Federated databases."

September 2001