Economists have long argued that an essential driver of economic growth is innovation. Past work has shown that differences in production technologies represent an important source of disparities in patterns of long-run economic growth across countries. For instance, some estimates suggest that roughly 50 percent of growth in U.S. annual gross domestic product can be attributed to innovation. Not surprisingly, policymakers have thus focused significant attention on policies designed to stimulate innovation and the supply of new technologies. Yet innovation includes not only the creation but also the diffusion of new technologies and products in the marketplace.

When examining policy, it is important to measure and consider the total impact of new technologies that will be involved in the policy. Beyond diffusion, there are many technologies whose impacts depend on appropriate use upon adoption. Thus an effective innovation should be measured by its returns at scale. Increasing research has begun to recognize that scale underlies all social and technological progress, since deeply impactful innovations are those that reach the largest number of people and retain their effectiveness when they do.

However, solutions in one setting are often frustrated when transferred to another. We denote this frustration as part of the scale-up problem, which revolves around several important questions: Do research findings persist in larger markets and broader settings? When we scale an intervention to these populations, should we expect the same level of efficacy that we observed in the small-scale setting? If not, then what are the important threats to scalability? Without a proper understanding of these and related questions, the scale-up problem can lead to a vast waste of resources, a missed opportunity to improve people’s lives, and decreased public trust in the scientific method’s ability to contribute to policymaking.

Our work explores the scale-up problem for an important class of new technologies in the energy space that leverage smart functionalities. Partnering with Opower and Honeywell in conjunction with Pacific Gas and Electric—the second-largest residential energy provider in the United States—we explore the effect that smart thermostats have on home energy usage. We examine data from two experiments wherein the 1,385 households that volunteered to participate in the study were randomized into either a treatment group that received free installation of a Honeywell two-way programmable smart thermostat or a control group that did not receive such a smart device and kept their existing thermostat. We evaluate the effect of the smart thermostat on subsequent energy consumption using high-frequency data over an 18-month period that includes more than 16 million hourly electricity-use records and almost 700,000 daily observations of natural gas consumption.

Smart thermostat producers report estimates that predict substantial energy savings from the adoption of smart thermostats. For instance, the ecobee website touts savings of up to 23 percent on heating and cooling costs, and the Google Nest website advertises a 10–12 percent savings on heating and a 15 percent savings on cooling costs. These claims inflate savings expectations by using heating- and cooling-specific energy use and by ignoring the local climate. However, engineering estimates of even greater pertinence from the California Technical Forum also predict that smart thermostats will produce substantial reductions in energy consumption. The most relevant estimates to our experiment participants come from Department of Energy Technical Reference Manuals, which are annual reports produced by energy providers and regulators. These reports primarily rely on engineering simulations and survey data to predict the effects of energy efficiency programs at scale. These predictions are then used by energy providers to justify expenditures on energy efficiency programs. Using these predictions for Californians, which vary by climate zone and the size of a home, we find that the subjects of our experiment should expect a savings of 1.3 percent and 4.0 percent, respectively, for overall electricity and natural gas consumption.

Our experimental estimates provide several insights into whether engineers’ estimates hold when technology is scaled beyond the lab. First, we find that smart thermostats fail to deliver the expected energy savings; our results show that such technologies do not have a significant effect on energy use. Some of our estimates, which account for differing climates between households, suggest that smart thermostats may actually increase electricity and gas consumption by 2.3 percent and 4.2 percent, respectively (although we cannot rule out effects of 0 percent). This failure of engineering estimates to accurately predict measured responses is broadly consistent with a growing body of research that documents real-world effects of energy efficient technology that pale in comparison to the effects predicted by engineers.

Second, we investigate whether this aggregate result masks significant, but offsetting, effects that differ between households that may have implications for how the intervention scales to different settings. We find almost no evidence of differing effects of smart thermostats between households. The overall pattern across all our results consistently indicates that smart thermostats underdeliver on the savings promised by engineers.

Third, to explore potential mechanisms that explain this pattern of results, we use almost 4 million observations of treatment group heating, ventilation, and HVAC (heating, ventilation, and air conditioning) system activity and user interactions with their smart thermostats in the form of scheduled temperature setpoints, temporary overrides, and HVAC system events. A key insight is that users frequently override permanently scheduled temperature setpoints. And when they do, the override settings are less energy efficient than the previously scheduled setpoint. Next, we test whether user behavior explains the discrepancy between the decrease in energy use purported by the engineering studies and our experimental estimates. We categorize smart thermostat households into flexible user-type categories based on how intensively they use the energy-saving features of their thermostat. We find that while some user types realize significant savings, engineering models fail to capture how most people actually use smart technologies, and this limits the usefulness of their estimates in real-world settings. More specifically, people adopt the smart technology but use its features in ways that undo the purported benefits, such as lowering energy use earlier in the day but ramping up energy use later. This suggests that human behavior is a peril to scaling such technologies.

For policymakers interested in scaling insights from the small to the large, we present a novel case study that is important given the recent evidence-based policy movement. Over the past several decades, research has evolved to be a key contributor to the scientific knowledge base from which policymakers draw insights. Our results reveal a key insight into user-based technologies; humans may not use the technology as envisioned and assumed by engineers.

NOTE
This research brief is based on Alec Brandon et al., “The Human Perils of Scaling Smart Technologies: Evidence from Field Experiments,” NBER Working Paper no. 30482, September 2022.