Bottom-up Memory Design Techniques for Energy-Efficient and Resilient Computing

Pi-Feng Chiu

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2018-156
December 1, 2018

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-156.pdf

Energy-efficient computing is critical for a wide range of electronic devices, from personal mobile devices that have limited battery capacity to cloud servers that have costly electricity bills. The increasing number of IoT devices has resulted in a growing demand for energy-efficient computing for battery-powered sensor nodes. The energy spent on memory access is a major contributor of total energy consumption, especially for new, data-intensive applications. Improving memory energy efficiency helps build energy-ecent computing systems. One effective way to achieve better energy efficiency is to lower the operating voltage by using the dynamic voltage and frequency scaling (DVFS). However, further reductions in voltage are limited by SRAM-based caches. The aggressive implementation of SRAM bit cells to achieve high density causes larger variation than in logic cells. In order to operate the processor at the optimal energy-efficient point, the SRAM needs to reliably work at a lower voltage.

The sense amplifier of the memory circuit detects the small signal from the bit cell to enable high-speed and low-power read operation. The mismatch between the transistors due to process variation causes an oset voltage in the sense amplifier, which could lead to incorrect results when the sensing signal is smaller than the offset. The double-tail sense amplifier (DTSA) is proposed as a drop-in replacement for a conventional SRAM sense amplifier to enable robust sensing at low voltages. The dual-stage design reduces the offset voltage with a pre-amplification stage. The self-timed regenerative stage simplifies the timing logic and reduces the area. By simply replacing the conventional sense amplifier with DTSA, SRAM can operate with a 50mV Vmin reduction at faster timing.

Memory resiliency can be achieved through architecture-level assist techniques, which enable low-voltage operation by avoiding failing cells. The line disable (LD) scheme deactivates faulty cache lines in a set-associative cache to prevent bitcell with errors from being accessed. The Vmin reduction of LD is limited by the allowable capacity loss with minimum performance degradation. The line recycling (LR) technique is proposed to reuse two disabled faulty cache lines to repair a third line. By recycling the faulty lines, 1/3 of the capacity loss due to LD can be avoided for the same Vmin, or one-third as many faulty cache lines can be ignored.

Emerging nonvolatile memory (NVM) technologies, such as STT-MRAM, RRAM, and PCM, offer a tremendous opportunity to improve energy efficiency in the memory system while continuously scaling. The new technologies are faster and more durable than NAND flash, therefore, can be placed closer to the processing unit to save the energy by powering off. However, reliable access to NVM cells faces several challenges, such as shifting of cell resistance distributions, small read margins, and wear-out.

The proposed differential 2R crosspoint resistive random access memory (RRAM) with array segmentation and sense-before-write techniques significantly improves read margin and removes data-dependent IR drop by composing one bit with two complementary cells. The proposed array architecture ensures large read margin (>100mV) even with a small resistance ratio (RH/RL=2).

In summary, this dissertation introduces techniques at different levels of memory design (device, circuit, and micro-architecture) that can work in concert to build a resilient and energy-efficient memory system. The proposed techniques are demonstrated on several chips fabricated in a 28nm CMOS process.

Advisor: Borivoje Nikolic


BibTeX citation:

@phdthesis{Chiu:EECS-2018-156,
    Author = {Chiu, Pi-Feng},
    Title = {Bottom-up Memory Design Techniques for Energy-Efficient and Resilient Computing},
    School = {EECS Department, University of California, Berkeley},
    Year = {2018},
    Month = {Dec},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-156.html},
    Number = {UCB/EECS-2018-156},
    Abstract = {Energy-efficient computing is critical for a wide range of electronic devices, from personal mobile devices that have limited battery capacity to cloud servers that have costly electricity bills. The increasing number of IoT devices has resulted in a growing demand for energy-efficient computing for battery-powered sensor nodes. The energy spent on memory access is a major contributor of total energy consumption, especially for new, data-intensive applications. Improving memory energy efficiency helps build energy-ecent computing systems. One effective way to achieve better energy efficiency is to lower the operating voltage by using the dynamic voltage and frequency scaling (DVFS). However, further reductions in voltage are limited by SRAM-based caches. The aggressive implementation of SRAM bit cells to achieve high density causes larger variation than in logic cells. In order to operate the processor at the optimal energy-efficient point, the SRAM needs to reliably work at a lower voltage.

The sense amplifier of the memory circuit detects the small signal from the bit cell to enable high-speed and low-power read operation. The mismatch between the transistors due to process variation causes an oset voltage in the sense amplifier, which could lead to incorrect results when the sensing signal is smaller than the offset. The double-tail sense amplifier (DTSA) is proposed as a drop-in replacement for a conventional SRAM sense amplifier to enable robust sensing at low voltages. The dual-stage design reduces the offset voltage with a pre-amplification stage. The self-timed regenerative stage simplifies the timing logic and reduces the area. By simply replacing the conventional sense amplifier with DTSA, SRAM can operate with a 50mV Vmin reduction at faster timing.

Memory resiliency can be achieved through architecture-level assist techniques, which enable low-voltage operation by avoiding failing cells. The line disable (LD) scheme deactivates faulty cache lines in a set-associative cache to prevent bitcell with errors from being accessed. The Vmin reduction of LD is limited by the allowable capacity loss with minimum performance degradation. The line recycling (LR) technique is proposed to reuse two disabled faulty cache lines to repair a third line. By recycling the faulty lines, 1/3 of the capacity loss due to LD can be avoided for the same Vmin, or one-third as many faulty cache lines can be ignored.

Emerging nonvolatile memory (NVM) technologies, such as STT-MRAM, RRAM, and PCM, offer a tremendous opportunity to improve energy efficiency in the memory system while continuously scaling. The new technologies are faster and more durable than NAND flash, therefore, can be placed closer to the processing unit to save the energy by powering off. However, reliable access to NVM cells faces several challenges, such as shifting of cell resistance distributions, small read margins, and wear-out.

The proposed differential 2R crosspoint resistive random access memory (RRAM) with array segmentation and sense-before-write techniques significantly improves read margin and removes data-dependent IR drop by composing one bit with two complementary cells. The proposed array architecture ensures large read margin (>100mV) even with a small resistance ratio (RH/RL=2).

In summary, this dissertation introduces techniques at different levels of memory design (device, circuit, and micro-architecture) that can work in concert to build a resilient and energy-efficient memory system. The proposed techniques are demonstrated on several chips fabricated in a 28nm CMOS process.}
}

EndNote citation:

%0 Thesis
%A Chiu, Pi-Feng
%T Bottom-up Memory Design Techniques for Energy-Efficient and Resilient Computing
%I EECS Department, University of California, Berkeley
%D 2018
%8 December 1
%@ UCB/EECS-2018-156
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-156.html
%F Chiu:EECS-2018-156