Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Embodied AI agents that can interact with the physical world hold immense potential for various applications. But the scarcity of training data remains one of their main hurdles.
To address this challenge, researchers from Imperial College London and Google DeepMind have introduced Diffusion Augmented Agents (DAAG), a novel framework that leverages the power of large language models (LLMs), vision language models (VLMs), and diffusion models to enhance the learning efficiency and transfer learning capabilities of embodied agents.
Why is data efficiency important for embodied agents?
The impressive progress in LLMs and VLMs in recent years has fueled hopes for their application to robotics and embodied AI. However, while LLMs and VLMs can be trained on massive text and image datasets scraped from the internet, embodied AI systems need to learn by interacting with the physical world.
The real world presents several challenges to data collection in embodied AI. First, physical environments are much more complex and unpredictable than the digital world. Second, robots and other embodied AI systems rely on physical sensors and actuators, which can be slow, noisy, and prone to failure.
The researchers believe that overcoming this hurdle will depend on making better use of the agent’s existing data and experience.
“We hypothesize that embodied agents can achieve greater data efficiency by leveraging past experience to explore effectively and transfer knowledge across tasks,” the researchers write.
What is DAAG?
Diffusion Augmented Agent (DAAG), the framework proposed by the Imperial College and DeepMind team, is designed to enable agents to learn tasks more efficiently by using past experiences and generating synthetic data.
“We are interested in enabling agents to autonomously set and score subgoals, even in the absence of external rewards, and to repurpose their experience from previous tasks to accelerate learning of new tasks,” the researchers write.
The researchers designed DAAG as a lifelong learning system, where the agent continuously learns and adapts to new tasks.
DAAG works in the context of a Markov Decision Process (MDP). The agent receives instructions for a task at the beginning of each episode. It observes the state of its environment, takes actions and tries to reach a state that aligns with the described task.
It has two memory buffers: a task-specific buffer that stores experiences for the current task and an “offline lifelong buffer” that stores all past experiences, regardless of the tasks they were collected for or their outcomes.
DAAG combines the strengths of LLMs, VLMs, and diffusion models to create agents that can reason about tasks, analyze their environment, and repurpose their past experiences to learn new objectives more efficiently.
The LLM acts as the agent’s central controller. When the agent receives a new task, the LLM interprets instructions, breaks them into smaller subgoals, and coordinates with the VLM and diffusion model to obtain reference frames for achieving its goals.
To make the best use of its past experience, DAAG uses a process called Hindsight Experience Augmentation (HEA), which uses the VLM and the diffusion model to augment the agent’s memory.
First, the VLM processes visual observations in the experience buffer and compares them to the desired subgoals. It adds the relevant observations to the agent’s new buffer to help guide its actions.
If the experience buffer does not have relevant observations, the diffusion model comes into play. It generates synthetic data to help the agent “imagine” what the desired state would look like. This enables the agent to explore different possibilities without physically interacting with the environment.
“Through HEA, we can synthetically increase the number of successful episodes the agent can store in its buffers and learn from,” the researchers write. “This allows to effectively reuse as much data gathered by the agent as possible, substantially improving efficiency especially when learning multiple tasks in succession.”
The researchers describe DAAG and HEA as the first method “to propose an entire autonomous pipeline, independent from human supervision, and that leverages geometrical and temporal consistency to generate consistent augmented observations.”
What are the benefits of DAAG?
The researchers evaluated DAAG on several benchmarks and across three different simulated environments, measuring its performance on tasks such as navigation and object manipulation. They found that the framework delivered significant improvements over baseline reinforcement learning systems.
For example, DAAG-powered agents were able to successfully learn to achieve goals even when they were not provided with explicit rewards. They were also able to reach their goals more quickly and with less interaction with the environment compared to agents that did not use the framework. And DAAG is better suited to effectively reuse data from previous tasks to accelerate the learning process for new objectives.
The ability to transfer knowledge between tasks is crucial for developing agents that can learn continuously and adapt to new situations. DAAG’s success in enabling efficient transfer learning in embodied agents has the potential to pave the way for more robust and adaptable robots and other embodied AI systems.
“This work suggests promising directions for overcoming data scarcity in robot learning and developing more generally capable agents,” the researchers write.