Alternate Representations for Scalable Analysis and Control of Heterogeneous Time Series

Francois Belletti

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2017-195

December 5, 2017

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-195.pdf

The recent increase in the availability of very large data sets has enabled major breakthroughs in Artificial Intelligence. Automated devices are now able to achieve a higher level of performance in computer vision, playing perfect and imperfect information games, and language processing which now compares to that of humans. Such progress is largely due to improved Single Instruction Multiple Data computing capabilities and higher bandwidth in distributed computing systems and innovative methods to leverage them.

A plethora of algorithms and theories developed in the field of Machine Learning enable better identification of system dynamics and extensive control of the corresponding systems. However, the vast majority of research focuses on problems dealing with homogenous observation data sets or control environment. Additionally, a large amount of work has focused on data sets comprising only of images, videos featuring similar sampling frequencies, and of time series with regular and identical timestamps of observation. Such a setting is not representative of the actual way data sets are collected and problems present themselves to practitioners. The present work delves into a more realistic setting where a unified representation of the data or control problem of interest is not available. We deal with a collection of heterogeneous sub-parts that relate one to another but do not naturally present themselves to practitioners in a homogenous fashion. Our main objective is to design methods that are readily applicable to heterogenous data sets and control problems in the distributed setting. The development of techniques that can be employed without additional pre-processing of the data makes them more practical to use by a broader range of individuals, companies and institutions.

Our motivation to find small, concise yet expressive representations is that they enable scalable computations in the distributed setting. When utilizing cohorts of computers inter-connected by a medium such as an ethernet or wireless network, communication time presents the main factor in the overall computing time. Decreasing the size of the messages that need to be transmitted therefore minimizes the overhead created by the need to communication information from a machine to another. With less time wasted in communication, we maximize the benefit of having more computational power and memory in the form of a distributed system.

The present work features novel results on the estimation of cross-correlation of irregularly observed time series with event-driven sampling. A new analysis of the linearized Aw-Rascle-Zhang system of Partial Differential Equations is developed that unravels conditions for travelling waves to expand in the system. A comparative study of a dual splitting algorithm we developed for distributed control reveals new results that highlight how the messages being transmitted are more useful to the cohort of agents for control than for an adversary to eavesdrop on individuals. The regularization scheme we developed for neural control policies enabled extensive and robust control ability that compares with cutting edge parametric control strategies despite that no preliminary calibration is needed with our method.

Advisors: Alexandre Bayen and Joseph Gonzalez

BibTeX citation:

@phdthesis{Belletti:EECS-2017-195,
    Author= {Belletti, Francois},
    Editor= {Bayen, Alexandre},
    Title= {Alternate Representations for Scalable Analysis and Control of Heterogeneous Time Series},
    School= {EECS Department, University of California, Berkeley},
    Year= {2017},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-195.html},
    Number= {UCB/EECS-2017-195},
    Abstract= {The recent increase in the availability of very large data sets 
has enabled major breakthroughs in Artificial Intelligence.
Automated devices are now able to achieve a higher level of performance in computer vision, playing perfect and imperfect information games, and language processing which now compares to that of humans.
Such progress is largely due to improved Single Instruction Multiple Data computing capabilities and higher bandwidth in distributed computing systems
and innovative methods to leverage them.

A plethora of algorithms and theories developed in the field of Machine Learning enable better identification of system dynamics and extensive control of the corresponding systems. However, the vast majority of research focuses on problems dealing with homogenous observation data sets or control environment.
Additionally, a large amount of work has focused on data sets comprising only of images, videos featuring similar sampling frequencies, and of time series with regular and identical timestamps of observation.
Such a setting is not representative of the actual way data sets are collected and problems present themselves to practitioners.
The present work delves into a more realistic setting where a unified representation of the data or control problem of interest is not available. We deal with a collection of heterogeneous sub-parts that relate one to another but do not naturally present themselves to practitioners in a homogenous fashion.
Our main objective is to design methods that are readily applicable to heterogenous data sets and control problems in the distributed setting. The development of techniques that can be employed without additional pre-processing of the data makes them more practical to use by a broader range of individuals, companies and institutions.

Our motivation to find small, concise yet expressive representations is that they enable scalable computations in the distributed setting. 
When utilizing cohorts of computers inter-connected by a medium such as an ethernet or wireless network, communication time presents the main factor in the overall computing time.
Decreasing the size of the messages that need to be transmitted therefore minimizes the overhead created by the need to communication information from a machine to another. With less time wasted in communication, we maximize the benefit of having more computational power and memory in the form of a distributed system.

The present work features novel results on the estimation of cross-correlation of irregularly observed time series with event-driven sampling. A new analysis of the linearized Aw-Rascle-Zhang system of Partial Differential Equations is developed that unravels conditions for travelling waves to expand in the system. A comparative study of a dual splitting algorithm we developed for distributed control reveals new results that highlight how the messages being transmitted are more useful to the cohort of agents for control than for an adversary to eavesdrop on individuals. The regularization scheme we developed for neural control policies enabled extensive and robust control ability that compares with cutting edge parametric control strategies despite that no preliminary calibration is needed with our method.},
}

EndNote citation:

%0 Thesis
%A Belletti, Francois 
%E Bayen, Alexandre 
%T Alternate Representations for Scalable Analysis and Control of Heterogeneous Time Series
%I EECS Department, University of California, Berkeley
%D 2017
%8 December 5
%@ UCB/EECS-2017-195
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-195.html
%F Belletti:EECS-2017-195