A Methodology and Tool Support for the Design and Evaluation of Fault Tolerant, Distributed Embedded Systems

Mark Lee McKelvin Jr

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2011-35
April 29, 2011

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-35.pdf

Embedded systems are becoming pervasive in diverse application domains, such as automotive, avionic, medical, and industrial automation control systems. Advancements in technology and the demand for sophisticated functionality to support a variety of applications are driving the increase in complexity of embedded systems, particularly in systems whose incorrect operation can result in significant consequences, such as financial loss or human life. As a result, these systems require high assurance to meet stringent constraints on reliability and fault tolerance, the ability to operate despite potential for components to operate incorrectly.

Reliability is an important design goal in distributed embedded systems that may be achieved by the provision of additional components in parallel or by improving component reliability. Thus, reliability in a fault tolerant system will be dictated by the combinations of components that operate incorrectly, or fail. Since, redundancy comes at a cost, the problem that designers face is determining which components to improve. Most existing approaches that seek to achieve better system reliability by determining levels of component redundancies and a selection of component reliabilities simultaneously do not consider the design of embedded systems. Of the approaches that do consider applications in the design of embedded systems, many do not consider the combinations of component failures, their location in the system architecture, and rate of failure due to the challenges and limitations of constructing reliability models that can express those characteristics.

In this dissertation, I present a design flow and a set of tools to support the design and analysis of distributed embedded systems with fault tolerant and reliability requirements using fault trees. A fault tree is a reliability model that is based on the failure characteristics of a system and its structure. The proposed design flow integrates the automatic generation and analysis of fault trees to enable the design of fault tolerant architectures. I will apply this design flow to the evaluation of a fault tolerant control application and to the evaluation of architecture alternatives for an automotive application.

Advisor: Alberto L. Sangiovanni-Vincentelli


BibTeX citation:

@phdthesis{McKelvin Jr:EECS-2011-35,
    Author = {McKelvin Jr, Mark Lee},
    Title = {A Methodology and Tool Support for the Design and Evaluation of Fault Tolerant, Distributed Embedded Systems},
    School = {EECS Department, University of California, Berkeley},
    Year = {2011},
    Month = {Apr},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-35.html},
    Number = {UCB/EECS-2011-35},
    Abstract = {Embedded systems are becoming pervasive in diverse application domains, such as automotive, avionic, medical, and industrial automation control systems. 
Advancements in technology and the demand for sophisticated functionality to support a variety of applications are driving the increase in complexity of embedded systems, particularly in systems whose incorrect operation can result in significant consequences, such as financial loss or human life. As a result, these systems require high assurance to meet stringent constraints on reliability and fault tolerance, the ability to operate despite potential for components to operate incorrectly.

Reliability is an important design goal in distributed embedded systems that may be achieved by the provision of additional components in parallel or by improving component reliability. Thus, reliability in a fault tolerant system will be dictated by the combinations of components that operate incorrectly, or fail. Since, redundancy comes at a cost, the problem that designers face is determining which components to improve. Most existing approaches that seek to achieve better system reliability by determining levels of component redundancies and a selection of component reliabilities simultaneously do not consider the design of embedded systems. Of the approaches that do consider applications in the design of embedded systems, many do not consider the combinations of component failures, their location in the system architecture, and rate of failure due to the challenges and limitations  of constructing reliability models that can express those characteristics. 

In this dissertation, I present a design flow and a set of tools to support the design and analysis of distributed embedded systems with fault tolerant and reliability requirements using fault trees. A fault tree is a reliability model that is based on the failure characteristics of a system and its structure. The proposed design flow integrates the automatic generation and analysis of fault trees to enable the design of fault tolerant architectures. I will apply this design flow to the evaluation of a fault tolerant control application and to the evaluation of architecture alternatives for an automotive application.}
}

EndNote citation:

%0 Thesis
%A McKelvin Jr, Mark Lee
%T A Methodology and Tool Support for the Design and Evaluation of Fault Tolerant, Distributed Embedded Systems
%I EECS Department, University of California, Berkeley
%D 2011
%8 April 29
%@ UCB/EECS-2011-35
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-35.html
%F McKelvin Jr:EECS-2011-35