Probabilistic Data Aggregation In Distributed Networks

Ling Huang and Ben Zhao and Anthony D. Joseph and John D. Kubiatowicz

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2006-11

February 6, 2006

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-11.pdf

We explore techniques to reduce the sensitivity of large-scale data aggregation networks to the loss of data. Our approach leverages multi-level modeling and prediction techniques to account for missing data points and is enabled by the temporal correlation that is present in typical data aggregation applications. The result can tolerate significant \emph{involuntary\/} data loss while minimizing overall impact on accuracy. Further, this technique permits nodes to probabilistically remove themselves from the network in order to reduce overall resource usage such as bandwidth or power consumption. In simulation, we explore the tradeoff between algorithmic complexity and prediction performance across a variety of data sets with different dynamic properties. We quantify the temporal correlation in several real-world datasets, and achieve more than 50\% resource savings in an environment with significant loss, while maintaining high accuracy.

BibTeX citation:

@techreport{Huang:EECS-2006-11,
    Author= {Huang, Ling and Zhao, Ben and Joseph, Anthony D. and Kubiatowicz, John D.},
    Title= {Probabilistic Data Aggregation In Distributed Networks},
    Year= {2006},
    Month= {Feb},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-11.html},
    Number= {UCB/EECS-2006-11},
    Abstract= {We explore techniques to reduce the sensitivity of large-scale data aggregation networks to the loss of data.  Our approach leverages multi-level modeling and prediction techniques to account for missing data points and is enabled by the temporal correlation that is present in typical data aggregation applications.  The result can tolerate significant \emph{involuntary\/} data loss while minimizing overall impact on accuracy.  Further, this technique permits nodes to probabilistically remove themselves from the network in order to reduce overall resource usage such as bandwidth or power consumption. In simulation, we explore the tradeoff between algorithmic complexity and prediction performance across a variety of data sets with different
dynamic properties.  We quantify the temporal correlation in several real-world datasets, and achieve more than 50\% resource savings in an environment with significant loss, while maintaining high accuracy.},
}

EndNote citation:

%0 Report
%A Huang, Ling 
%A Zhao, Ben 
%A Joseph, Anthony D. 
%A Kubiatowicz, John D. 
%T Probabilistic Data Aggregation In Distributed Networks
%I EECS Department, University of California, Berkeley
%D 2006
%8 February 6
%@ UCB/EECS-2006-11
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-11.html
%F Huang:EECS-2006-11