D. Chinnery and K. Keutzer, Closing the power gap between ASIC \& custom: tools and techniques for low power design, Springer Science \& Business Media, 2008.
M. Gries and K. W. Keutzer, Eds., Building ASIPs: The MESCAL Methodology, New York: Springer, 2005.
P. Chen, D. A. Kirkpatrick, and K. W. Keutzer, Static Crosstalk-Noise Analysis: For Deep Sub-Micron Digital Designs, Norwell, MA: Kluwer Academic Publishers, 2004.
D. Chinnery and K. Keutzer, Closing the Gap Between ASIC & Custom: Tools and Techniques for High-Performance ASIC Design, Boston, MA: Kluwer Academic Publishers, 2002.
Book chapters or sections
M. Anderson, B. Catanzaro, J. Chong, E. Gonina, K. Keutzer, C. Lai, M. W. Moskewicz, M. Murphy, B. Su, and K. Keutzer, "PALLAS: Mapping Applications onto Manycore," in Multiprocessor System-on-Chip: Hardware Design and Tool Integration, Springer, 2010, pp. 89-114.
K. Keutzer and K. Ravindran, "Technology mapping," in Encyclopedia of Algorithms, M. Y. Kao, Ed., Springer Reference, Berlin, Germany: Springer, 2008, pp. 944-946.
Articles in journals or magazines
Y. You, Z. Zhang, C. Hsieh, J. Demmel, and K. Keutzer, "Fast deep neural network training on distributed systems and cloud TPUs," IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 11, pp. 2449--2462, Nov. 2019.
Z. Yao, A. Gholami, Q. Lei, K. Keutzer, and M. W. Mahoney, "Hessian-based analysis of large batch training and robustness to adversaries," Advances in Neural Information Processing Systems, vol. 31, pp. 4949--4959, Dec. 2018.
F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size," arXiv preprint arXiv:1602.07360, 2016.
K. Asanović, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, N. Morgan, D. A. Patterson, K. Sen, J. Wawrzynek, D. Wessel, and K. A. Yelick, "A View of the Parallel Computing Landscape," Communications of the ACM, vol. 52, no. 10, pp. 56-67, Oct. 2009.
K. You, J. Chong, Y. Yi, E. Gonina, C. J. Hughes, Y. Chen, W. Sung, and K. Keutzer, "Parallel scalability in speech recognition," IEEE Signal Processing Magazine, vol. 26, no. 6, pp. 124--135, June 2009.
W. Hwu, K. Keutzer, and T. G. Mattson, "The concurrency challenge," IEEE Design and Test of Computers, vol. 25, no. 4, pp. 312-320, July 2008.
S. Shen, Z. Yao, A. Gholami, M. Mahoney, and K. Keutzer, "Powernorm: Rethinking batch normalization in transformers," in International Conference on Machine Learning, 2020, pp. 8741--8751.
Y. Cai, Z. Yao, Z. Dong, A. Gholami, M. W. Mahoney, and K. Keutzer, "Zeroq: A novel zero shot quantization framework," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13169--13178.
Y. You, J. Li, S. Reddi, J. Hseu, S. Kumar, S. Bhojanapalli, X. Song, J. Demmel, K. Keutzer, and C. Hsieh, "Large batch optimization for deep learning: Training bert in 76 minutes," in International Conference on Learning Representations, 2020.
S. Shen, Z. Dong, J. Ye, L. Ma, Z. Yao, A. Gholami, M. W. Mahoney, and K. Keutzer, "Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT.," in AAAI, 2020, pp. 8815--8821.
S. Zhao, B. Li, X. Yue, Y. Gu, P. Xu, R. Hu, H. Chai, and K. Keutzer, "Multi-source domain adaptation for semantic segmentation," in Advances in Neural Information Processing Systems, 2019, pp. 7287--7300.
X. Yue, Y. Zhang, S. Zhao, A. L. Sangiovanni-Vincentelli, K. Keutzer, and B. Gong, "Domain randomization and pyramid consistency: Simulation-to-real generalization without accessing target domain data," in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 2100--2110.
Z. Dong, Z. Yao, A. Gholami, M. W. Mahoney, and K. Keutzer, "Hawq: Hessian aware quantization of neural networks with mixed-precision," in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 293--302.
Z. Yao, A. Gholami, P. Xu, K. Keutzer, and M. W. Mahoney, "Trust region based adversarial attack on neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 11350--11359.
B. Wu, X. Dai, P. Zhang, Y. Wang, F. Sun, Y. Wu, Y. Tian, P. Vajda, Y. Jia, and K. Keutzer, "Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 10734--10742.
B. Wu, X. Zhou, S. Zhao, X. Yue, and K. Keutzer, "Squeezesegv2: Improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud," in 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 4376--4382.
A. Gholami, K. Kwon, B. Wu, Z. Tai, X. Yue, P. Jin, S. Zhao, and K. Keutzer, "Squeezenext: Hardware-aware neural network design," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018, pp. 1638--1647.
Y. You, Z. Zhang, C. Hsieh, J. Demmel, and K. Keutzer, "Imagenet training in minutes," in Proceedings of the 47th International Conference on Parallel Processing, 2018, pp. 1--10.
P. Jin, K. Keutzer, and S. Levine, "Regret minimization for partially observable deep reinforcement learning," in International conference on machine learning, 2018, pp. 2342--2351.
S. Zhao, G. Ding, Q. Huang, T. Chua, B. W. Schuller, and K. Keutzer, "Affective Image Content Analysis: A Comprehensive Survey.," in IJCAI, 2018, pp. 5534--5541.
A. Gholami, A. Azad, P. Jin, K. Keutzer, and A. Buluc, "Integrated model, batch, and domain parallelism in training neural networks," in Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures, 2018, pp. 77--86.
B. Wu, A. Wan, X. Yue, P. Jin, S. Zhao, N. Golmant, A. Gholaminejad, J. Gonzalez, and K. Keutzer, "Shift: A zero flop, zero parameter alternative to spatial convolutions," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 9127--9135.
B. Wu, A. Wan, X. Yue, and K. Keutzer, "Squeezeseg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud," in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 1887--1893.
F. Iandola and K. Keutzer, "small neural nets are beautiful: enabling embedded systems with small deep-neural-network architectures," in 2017 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS), 2017, pp. 1--10.
B. Wu, F. Iandola, P. H. Jin, and K. Keutzer, "Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 129--137.
F. N. Iandola, M. W. Moskewicz, K. Ashraf, and K. Keutzer, "Firecaffe: near-linear acceleration of deep neural network training on compute clusters," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2592--2600.
E. Gonina, G. Friedland, H. Cook, and K. Keutzer, "Fast Speaker Diarization using a High-Level Scripting Language," in Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, ASRU'11, 2011.
N. Sundaram, T. Brox, and K. Keutzer, "Dense point trajectories by GPU-accelerated large displacement optical flow," in European conference on computer vision, 2010, pp. 438--451.
D. Kolossa, J. Chong, S. Zeiler, and K. Keutzer, "Efficient Manycore CHMM Speech Recognition for Audiovisual and Multistream Data," in Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010, pp. 2698-2701.
J. Chong, E. Gonina, and K. Keutzer, "Monte Carlo Methods," in 2nd Annual Conference on Parallel Programming Patterns (ParaPLoP'10), 2010.
M. Dixon, J. Chong, and K. Keutzer, "Acceleration of Market Value-at-Risk Estimation," in Proceedings of the 2nd Workshop on High Performance Computational Finance, WHPCF '09, New York, NY, USA: ACM, 2009, pp. 5:1--5:8.
K. You, J. Chong, Y. Yi, E. Gonina, C. Hughes, Y. Chen, W. Sung, and K. Keutzer, "Parallel Scalability in Speech Recognition: Inference engine in large vocabulary continuous speech recognition," in IEEE Signal Processing Magazine, Vol. 26, 2009, pp. 124-135.
J. Chong, E. Gonina, Y. Yi, and K. Keutzer, "A Fully Data Parallel WFST-based Large Vocabulary Continuous Speech Recognition on a Graphics Processing Unit," in Proceedings of the 10th Annual Conference of the International Speech Communication Association (InterSpeech), 2009, pp. 1183–1186.
B. C. Catanzaro, S. A. Kamil, Y. Lee, K. Asanović, J. Demmel, K. Keutzer, J. Shalf, K. A. Yelick, and A. Fox, "SEJITS: Getting productivity and performance with selective embedded JIT specialization," in Proceedings First Workshop on Programming Models for Emerging Architectures, 2009.
B. Catanzaro, N. Sundaram, and K. Keutzer, "Fast support vector machine training and classification on graphics processors," in Proceedings of the 25th international conference on Machine learning, 2008, pp. 104--111.
B. C. Catanzaro, N. Sundaram, and K. Keutzer, "Fast support vector machine training and classification on graphics processors," in Proc. 25th Intl. Conf. on Machine Learning (ICML 2008), A. McCallum and S. Roweis, Eds., ACM International Conference Proceeding Series, Vol. 307, New York, NY: The Association for Computing Machinery, Inc., 2008, pp. 104-111.
J. Chong, Y. Yi, A. Faria, N. Satish, and K. Keutzer, "Data-Parallel Large Vocabulary Continuous Speech Recognition on Graphics Processors," in Proceedings of the 1st Annual Workshop on Emerging Applications and Many Core Architecture, 2008, pp. 23-35.
J. Chong, Y. Yi, A. Faria, N. Satish, and K. Keutzer, "Data-Parallel Large Vocabulary Continuous Speech Recognition on Graphics Processors," in Proceedings of the 1st Annual Workshop on Emerging Applications and Many Core Architecture, 2008, pp. 23-35.
B. Catanzaro, N. Sundaram, and K. Keutzer, "Fast support vector machine training and classification on graphics processors," in ICML '08: Proceedings of the 25th international conference on Machine learning, New York, NY, USA: ACM, 2008, pp. 104--111.
B. Catanzaro, K. Keutzer, and B. Y. Su, "Parallelizing CAD: A timely research agenda for EDA," in Proc. 45th ACM/IEEE Design Automation Conf. (DAC 2008), New York, NY: The Association for Computing Machinery, Inc., 2008, pp. 12-17.
S. Sapatnekar, E. Haritan, K. Keutzer, A. Devgan, D. A. Kirkpatrick, S. Meier, D. Pryor, and T. Spyrou, "Reinventing EDA with manycore processors," in Proc. 45th ACM/IEEE Design Automation Conf. (DAC 2008), New York, NY: The Association of Computing Machinery, Inc., 2008, pp. 126-127.
J. Chong, Y. Yi, A. Faria, N. Satish, and K. Keutzer, "Data-Parallel Large Vocabulary Continuous Speech Recognition on Graphics Processors," in Proceedings of the 1st Annual Workshop on Emerging Applications and Many Core Architecture (EAMA), 2008, pp. 23--35.
F. Bacchini, G. Spirakis, J. A. Carballo, A. de Geus, F. C. Hsu, K. Keutzer, and K. Yamada, "Megatrends and EDA 2017 (Panel Session)," in Proc. 44th Design Automation Conf. (DAC 2007), New York, NY: The Association for Computing Machinery, Inc., 2007, pp. 21-22.
K. Keutzer, S. Malik, and A. R. Newton, "From ASIC to ASIP: The next design discontinuity," in Proc. 2002 IEEE Conf. on Computer Design, Los Alamitos, CA: IEEE Computer Society Press, 2002, pp. 84-90.
D. Sylvester and K. Keutzer, "Getting to the bottom of deep submicron," in 1998 IEEE/ACM Intl. Conf. on Computer-Aided Design. Digest of Technical Papers, New York, NY: ACM, 1998, pp. 203-11.
Y. You, Z. Zhang, C. Hsieh, J. Demmel, and K. Keutzer, "ImageNet Training in Minutes," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2020-18, Jan. 2020.
Y. You, J. Demmel, K. Keutzer, C. Hsieh, C. Ying, and J. Hseu, "Large-Batch Training for LSTM and Beyond," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2018-138, Nov. 2018.
M. Anderson, G. Ballard, J. Demmel, and K. Keutzer, "Communication-Avoiding QR Decomposition for GPUs," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2010-131, Oct. 2010.
K. Asanović, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick, "The Landscape of Parallel Computing Research: A View from Berkeley," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2006-183, Dec. 2006.
N. Shah, W. Plishker, and K. Keutzer, "A Programming Model for Network Processors," EECS Department, University of California, Berkeley, Tech. Rep. UCB/ERL M02/35, Nov. 2002.
P. Chong, M. Prasad, and K. Keutzer, "Why Is ATPG Easy?," EECS Department, University of California, Berkeley, Tech. Rep. UCB/ERL M99/9, Feb. 1999.
J. Cong, K. Keutzer, and G. Martin, "High-level CAD and architecture (Invited Talk)," presented at Pre-Conference Workshop on Grand Challenges in FPGA Research: 15th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGS 2007), Monterey, CA, Feb. 2007.
Ph.D. Theses
S. Shen, "Efficient and Scalable Large Multimodal Models," T. Darrell and K. Keutzer, Eds., EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2024-186, Aug. 2024.
N. Lee, K. Keutzer, and G. K. Anumanchipalli, "Exploring the Limits of Small Language Models," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2023-141, May 2023.
S. Kim, S. Shen, D. Thorsley, A. Gholami, W. Kwon, J. Hassoun, and K. Keutzer, "Learned Token Pruning for Efficient Transformer Inference," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2023-119, May 2023.