Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

Ritwik Gupta and Colorado Reed and Shufan Li and Sarah Brockman and Christopher Funk and Brian Clipp and Kurt Keutzer and Salvatore Candido and Matt Uyttendaele and Trevor Darrell

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-263

December 1, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-263.pdf

Large, pretrained models are commonly finetuned with imagery that is heavily augmented to mimic different conditions and scales, with the resulting models used for various tasks with imagery from a range of spatial scales. Such models overlook scale-specific information in the data for scale-dependent domains, such as remote sensing. In this paper, we present Scale-MAE, a pretraining method that explicitly learns relationships between data at different, known scales throughout the pretraining process. Scale-MAE pretrains a network by masking an input image at a known input scale, where the area of the Earth covered by the image determines the scale of the ViT positional encoding, not the image resolution. Scale-MAE encodes the masked image with a standard ViT backbone, and then decodes the masked image through a bandpass filter to reconstruct low/high frequency images at lower/higher scales. We find that tasking the network with reconstructing both low/high frequency images leads to robust multiscale representations for remote sensing imagery. Scale-MAE achieves an average of a 2.4−5.6% non-parametric kNN classification improvement across eight remote sensing datasets compared to current state-of-the-art and obtains a 0.9 mIoU to 1.7 mIoU improvement on the SpaceNet building segmentation transfer task for a range of evaluation scales.

Advisors: S. Shankar Sastry and Trevor Darrell

BibTeX citation:

@mastersthesis{Gupta:EECS-2023-263,
    Author= {Gupta, Ritwik and Reed, Colorado and Li, Shufan and Brockman, Sarah and Funk, Christopher and Clipp, Brian and Keutzer, Kurt and Candido, Salvatore and Uyttendaele, Matt and Darrell, Trevor},
    Title= {Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning},
    School= {EECS Department, University of California, Berkeley},
    Year= {2023},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-263.html},
    Number= {UCB/EECS-2023-263},
    Abstract= {Large, pretrained models are commonly finetuned with imagery that is heavily augmented to mimic different conditions and scales, with the resulting models used for various tasks with imagery from a range of spatial scales. Such models overlook scale-specific information in the data for scale-dependent domains, such as remote sensing. In this paper, we present Scale-MAE, a pretraining method that explicitly learns relationships between data at different, known scales throughout the pretraining process. Scale-MAE pretrains a network by masking an input image at a known input scale, where the area of the Earth covered by the image determines the scale of the ViT positional encoding, not the image resolution. Scale-MAE encodes the masked image with a standard ViT backbone, and then decodes the masked image through a bandpass filter to reconstruct low/high frequency images at lower/higher scales. We find that tasking the network with reconstructing both low/high frequency images leads to robust multiscale representations for remote sensing imagery. Scale-MAE achieves an average of a 2.4−5.6% non-parametric kNN classification improvement across eight remote sensing datasets compared to current state-of-the-art and obtains a 0.9 mIoU to 1.7 mIoU improvement on the SpaceNet building segmentation transfer task for a range of evaluation scales.},
}

EndNote citation:

%0 Thesis
%A Gupta, Ritwik 
%A Reed, Colorado 
%A Li, Shufan 
%A Brockman, Sarah 
%A Funk, Christopher 
%A Clipp, Brian 
%A Keutzer, Kurt 
%A Candido, Salvatore 
%A Uyttendaele, Matt 
%A Darrell, Trevor 
%T Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning
%I EECS Department, University of California, Berkeley
%D 2023
%8 December 1
%@ UCB/EECS-2023-263
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-263.html
%F Gupta:EECS-2023-263