(Fall 2007 - Franklin & Hellerstein):
"Question 1: DB Architecture: Describe the steps involved in bootstrapping a DBMS.
Question 2: Semi-structured Data: Overview the motivation and definition of semi-structured
data models. Discuss ramifications on data modeling as well as on storage and retrieval."
(Spring 2007 - Franklin & Hellerstein):
"Question 1: Transaction Management. Give a crisp definition of a transaction.
Provide a simple but inefficient mechanism for implementing this definition.
Enhance this mechanism to improve concurrency -- why is concurrency important?
Question 2: Objects and Databases. Highlight two main architectural thrusts for
incorporating object management into databases. Critique each approach from the
perspective of the other."
(Fall 2006 - Franklin & Hellerstein):
"1) Describe protocols optimistic concurrency control. Relate their
performance to blocking protocols for concurrency control. Discuss
the challenges of maintaining consistency in the fact of concurrent
access in distributed and disconnected environments.
2) Give an overview of the data mining tasks of clustering and classification.
Describe the basic data structure in the BIRCH clustering technique, and ways
in which that structure is tuned during the course of the BIRCH algorithms."
(Spring 2005 - Franklin & Hellerstein):
"1) Describe a variant of the System R dynamic programming optimizer that
produces the optimal plan subject to the constraint that only B buffers
of memory are to be used.
2) 2) Review the assigned Lehman/Yao B-Link scheme for concurrent B-trees.
Design an efficient logging and recovery technique that integrates with
the B-Link scheme."
(Fall 2004 - Franklin & Hellerstein):
"Question 1: Explain the goal of Optimistic Concurrency Control (what
semantics does it provide); Describe the Kung and Robinson Algorithm.
What are the true tradeoffs between K&R OCC and locking, despite what
the K&R paper says?. Assume that transactions arrive with an indication
of whether or not they are "read-only" and that 90% of the transactions
are marked as such. How could you exploit this in a single-node system?
What about a cluster?
Question 2: Explain the concept of association rules and the "a priori"
algorithm mining them from a transaction database; Explain the concept
of data cubes and their efficient computation; Describe the underlying
connections between the two and how this could be exploited to improve
data cube computation."
(Spring 2003 - Hellerstein & Rowe):
"1) Transactional storage: We asked the student to describe on-disk data
structures for storing a relation as a heap file. We then asked the
student to consider concurrency and recovery challenges in managing a
database system which only manages heap files (no indexes). This
exercised their understanding of locking and lock granularities, as
well as logging and recovery for both data and structure modification
operations.
2) Query processor architecture and code overhead: We asked the student
to describe the code-path for processing a query in a DBMS. We asked
them to characterize the I/O and CPU overheads of each module in the
code, and discuss the benefits of placing logic at different levels of the system."
(Fall 2002 - Franklin & Rowe):
- Design of a database to support a television news operation that wished to maintain
an archive of their broadcast stories.
- Storage systems capable of storing multiple, historical versions of data items.
- Design of a database system for a package delivery service.
- Computation of aggregate queries in an on-line analytical processing (OLAP) setting.
(Spring 2002 - Hellerstein & Franklin):
"- DBMIN buffer manager.
- Design alternatives for laying out tuples on disk pages and their
impact on CPU and processor cache behavior.
- Query optimization > advanced techniques for optimizing queries with
duplicate elimination and aggregation.
- Concurrency control for B+Trees on both the physical level and
logical level."
(Fall 2001 - Hellerstein & Franklin):
"Q1: - Definition of Transactions, Serializability, Conflict
vs. idealized serializability, two-phase locking, phantom problem.
- Describe the steps a database system takes to boot
from scratch.
Q2: - Main memory databases and the need for concurrency.
- Recovery and logical vs. physical operations.
Q3: - Parallel query processing algorithms.
- Dangers of large scale replication.
Q4: - I/O behavior of sorting and hashing.
- Association Rule mining - definition and algorithm."
(Fall 2000 - Franklin & Hellerstein):
"Q1: Concurrency Control: Strict vs. Non-strict locking, recovery
implications, and specialized locking for index structures.
Q2: Database Design and Query Processing for Text Processing
Q3: Recovery - Nested top level actions"
(Spring 2000 - Franklin & Hellerstein):
"Q1: Describe the degrees of consistency in a transactional basis from a
lock-based implementation perspective. Characterize the semantic
differences offered by the different degrees."
Q2: What are the time and resource costs of different hash join
algorithms? Characterize some main differences between the pipelined
hash join algorithm and a more traditional hash join algorithm of your
choosing.
Q3: What is the motivation of a subtle details of the ARIES recovery
protocol called 'nested top actions?'"
(Fall 1999 - Franklin & Hellerstein):
"Q1: Describe 'Degrees of Consistency' and discuss locking and
optimistic techniques and how they relate to phantoms.
Q2: Discuss Semi-Joins, and how they relate to precomputation and
caching of expensive operations. Discuss join strategies for Distributed
and Federated databases."
September 2001