Research Themes +
Energy Technologies Area (ETA) researchers are continually building on the strong scientific foundation we have developed over the past 50 years. We address the world’s most pressing scientific challenges across the buildings, transportation, and industrial sectors. ETA is at the forefront of improving the country's aging electrical grid and innovating distributed energy and storage solutions; developing grid-integrated building systems; and providing the most comprehensive market and data analysis worldwide.
Publications
News +
For media inquiries, please email:

[email protected]
About Us +
The Energy Technologies Area (ETA) is unique in translating fundamental scientific discoveries into scalable technology adoption. Our approach combines an understanding of the marketplace and the role of state and federal regulation and policies. ETA's research drives real-world, practical results that affect and improve the everyday lives of Americans and those across the globe. Saving energy and increasing reliability are key to the foundation of our research, which is driven by techno-economic analysis and in-lab experimentation and discovery.

Towards Off-policy Evaluation as a Prerequisite for Real-world Reinforcement Learning in Building Control

Publication Type

Conference Paper

Date Published

11/2020

Authors

Chen, Bingqing, Ming Jin, Zhe Wang, Tianzhen Hong, Mario Bergés

DOI

10.1145/342777310.1145/3427773.3427871

Abstract

We present an initial study of off-policy evaluation (OPE), a prob-lem prerequisite to real-world reinforcement learning (RL), in the context of building control. OPE is the problem of estimating a pol-icy’s performance without running it on the actual system, using historical data from the existing controller. It enables the control en-gineers to ensure a new, pretrained policy satisfies the performance requirements and safety constraints of a real-world system, prior to interacting with it. While many methods have been developed for OPE, no study has evaluated which ones are suitable for building operational data, which are generated by deterministic policies and have limited coverage of the state-action space. After reviewing existing works and their assumptions, we adopted the approxi-mate model (AM) method. Furthermore, we used bootstrapping to quantify uncertainty and correct for bias. In a simulation study, we evaluated the proposed approach on 10 policies pretrained with im-itation learning. On average, the AM method estimated the energy and comfort costs with 1.84% and 14.1% error, respectively.

Journal

Proceedings of the 1st International Workshop on Reinforcement Learning for Energy Management in Buildings & Cities

Year of Publication

2020