Vivek Nair

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-232

November 14, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-232.pdf

Motion tracking "telemetry" data lies at the core of nearly all modern extended reality (XR) and metaverse experiences. While it has long been known that people reveal information about themselves via their motion, the extent to which these findings apply to XR platforms has, until recently, not been widely understood, with most users perceiving motion to be amongst the more innocuous categories of data in XR. Contrary to these perceptions, this dissertation explores the unprecedented risks and opportunities of XR motion data. We present both a series of attacks that illustrate the severity of the XR privacy threat and a set of defensive countermeasures to protect user privacy in XR while maintaining a positive user experience.

We first present a detailed systematization of the landscape of XR privacy attacks and defenses by proposing a comprehensive taxonomy of data attributes, information flow, adversaries, and countermeasures based an analysis of over 60 prior studies. We then identify and describe a novel dataset of over 4.7 million motion capture recordings, voluntarily submitted by over 105,000 XR device users from over 50 countries. In addition to being over 200 times larger than the largest prior motion capture research dataset, this data is critical to enabling several major contributions throughout this dissertation.

First, using our new dataset, we show that a large number of real XR users can be uniquely and reliably identified across multiple sessions using just their head and hand motion relative to virtual objects. After training a classification model on 5 minutes of data per person, a user can be uniquely identified amongst the entire pool of 55,541 with 94.33% accuracy from 100 seconds of motion, and with 73.20% accuracy from just 10 seconds of motion. Then, we go a step further, showing that a variety of private user information can be inferred just by analyzing motion data recorded from XR devices. After conducting a large-scale survey of XR users with dozens of questions ranging from background and demographics to behavioral patterns and health information, we demonstrate that simple machine learning models can accurately and consistently infer over 40 personal attributes from XR motion data alone. In a third study, we show that adversarially designed XR games can infer an even wider range of attributes than can be observed by passive observation alone. After inviting 50 study participants to play an innocent-looking "escape room" game in XR, we show that an adversarial program could accurately infer over 25 of their data attributes, from anthropometrics like height and wingspan to demographics like age and gender.

While users have, to some extent, grown accustomed to privacy attacks on the web, metaverse platforms carry many of the privacy risks of the conventional internet (and more) while at present offering few of the defensive utilities that users are accustomed to having access to. To remedy this, we present the first known method of implementing an "incognito mode" for XR. Our technique leverages local differential privacy to quantifiably obscure sensitive user data attributes, with a focus on intelligently adding noise when and where it is needed most to maximize privacy while minimizing usability impact. However, we then demonstrate a state-of-the-art XR identification model architecture that can convincingly bypass this anonymization technique when trained on a sufficiently large dataset. Therefore, we ultimately propose a "deep motion masking" approach that scalably and effectively facilitates the real-time anonymization of XR telemetry data. Through a large-scale user study, we demonstrate that our method is effective at achieving both cross-session unlinkability and indistinguishability of anonymized motion data.

Advisors: Dawn Song


BibTeX citation:

@phdthesis{Nair:EECS-2023-232,
    Author= {Nair, Vivek},
    Editor= {Song, Dawn and O'Brien, James and Hartmann, Björn and Rosenberg, Louis},
    Title= {The Unprecedented Risks and Opportunities of Extended Reality Motion Data},
    School= {EECS Department, University of California, Berkeley},
    Year= {2023},
    Month= {Nov},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-232.html},
    Number= {UCB/EECS-2023-232},
    Abstract= {Motion tracking "telemetry" data lies at the core of nearly all modern extended reality (XR) and metaverse experiences. While it has long been known that people reveal information about themselves via their motion, the extent to which these findings apply to XR platforms has, until recently, not been widely understood, with most users perceiving motion to be amongst the more innocuous categories of data in XR. Contrary to these perceptions, this dissertation explores the unprecedented risks and opportunities of XR motion data. We present both a series of attacks that illustrate the severity of the XR privacy threat and a set of defensive countermeasures to protect user privacy in XR while maintaining a positive user experience.

We first present a detailed systematization of the landscape of XR privacy attacks and defenses by proposing a comprehensive taxonomy of data attributes, information flow, adversaries, and countermeasures based an analysis of over 60 prior studies. We then identify and describe a novel dataset of over 4.7 million motion capture recordings, voluntarily submitted by over 105,000 XR device users from over 50 countries. In addition to being over 200 times larger than the largest prior motion capture research dataset, this data is critical to enabling several major contributions throughout this dissertation.

First, using our new dataset, we show that a large number of real XR users can be uniquely and reliably identified across multiple sessions using just their head and hand motion relative to virtual objects. After training a classification model on 5 minutes of data per person, a user can be uniquely identified amongst the entire pool of 55,541 with 94.33% accuracy from 100 seconds of motion, and with 73.20% accuracy from just 10 seconds of motion. Then, we go a step further, showing that a variety of private user information can be inferred just by analyzing motion data recorded from XR devices. After conducting a large-scale survey of XR users with dozens of questions ranging from background and demographics to behavioral patterns and health information, we demonstrate that simple machine learning models can accurately and consistently infer over 40 personal attributes from XR motion data alone. In a third study, we show that adversarially designed XR games can infer an even wider range of attributes than can be observed by passive observation alone. After inviting 50 study participants to play an innocent-looking "escape room" game in XR, we show that an adversarial program could accurately infer over 25 of their data attributes, from anthropometrics like height and wingspan to demographics like age and gender.

While users have, to some extent, grown accustomed to privacy attacks on the web, metaverse platforms carry many of the privacy risks of the conventional internet (and more) while at present offering few of the defensive utilities that users are accustomed to having access to. To remedy this, we present the first known method of implementing an "incognito mode" for XR. Our technique leverages local differential privacy to quantifiably obscure sensitive user data attributes, with a focus on intelligently adding noise when and where it is needed most to maximize privacy while minimizing usability impact. However, we then demonstrate a state-of-the-art XR identification model architecture that can convincingly bypass this anonymization technique when trained on a sufficiently large dataset. Therefore, we ultimately propose a "deep motion masking" approach that scalably and effectively facilitates the real-time anonymization of XR telemetry data. Through a large-scale user study, we demonstrate that our method is effective at achieving both cross-session unlinkability and indistinguishability of anonymized motion data.},
}

EndNote citation:

%0 Thesis
%A Nair, Vivek 
%E Song, Dawn 
%E O'Brien, James 
%E Hartmann, Björn 
%E Rosenberg, Louis 
%T The Unprecedented Risks and Opportunities of Extended Reality Motion Data
%I EECS Department, University of California, Berkeley
%D 2023
%8 November 14
%@ UCB/EECS-2023-232
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-232.html
%F Nair:EECS-2023-232