Repository logo
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Communities & Collections
  • Research Outputs
  • Fundings & Projects
  • People
  • Statistics
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Indian Institute of Technology Madras
  3. Publication6
  4. Epopt: Learning robust neural network policies using model ensembles
 
  • Details
Options

Epopt: Learning robust neural network policies using model ensembles

Date Issued
01-01-2017
Author(s)
Rajeswaran, Aravind
Ghotra, Sarvjeet
Ravindran, Balaraman
Levine, Sergey
Abstract
Sample complexity and safety are major challenges when learning policies with reinforcement learning for real-world tasks, especially when the policies are represented using rich function approximators like deep neural networks. Model-based methods where the real-world target domain is approximated using a simulated source domain provide an avenue to tackle the above challenges by augmenting real data with simulated data. However, discrepancies between the simulated source domain and the target domain pose a challenge for simulated training. We introduce the EPOpt algorithm, which uses an ensemble of simulated source domains and a form of adversarial training to learn policies that are robust and generalize to a broad range of possible target domains, including unmodeled effects. Further, the probability distribution over source domains in the ensemble can be adapted using data from target domain and approximate Bayesian methods, to progressively make it a better approximation. Thus, learning on a model ensemble, along with source domain adaptation, provides the benefit of both robustness and learning/adaptation.
Indian Institute of Technology Madras Knowledge Repository developed and maintained by the Library

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback