Verdie: An Embodied Agent with Tools
Outdoor maintenance is a $1 trillion market that has a huge impact on our environment and community spaces. Today, a large percentage of this labor is done using a wide variety of highly pollutant gas power tools; string trimmers, leaf blowers, weed sprayers, etc.. At Electric Sheep our goal is to transition this work to be safe and sustainable with emission-free automation. To move toward this goal we created an embodied AI agent, Verdie, capable of learning how to use power tools and generalizing to work sites we service across the country.
Verdie is inspired by robots such as WALL-E, R2-D2 and BB-8, moderately complex, non-humanoid agents that perform meaningful work with embodied AI. From the inception of the project, our intention has been to develop a robot that can use a variety of common power tools in a animal-like manner unique to its embodiement. When compared to multi-purpose agents like humanoids, we decided to trade off on degrees of freedom while still enabling generality- with significantly lower cost (<10k) and hardware complexity. Ultimately, this is enabling us to quickly ship a multi-purpose robot with a scalable form factor.
Given scalable hardware, the challenge is to add the intelligence to use various tools. Tool use requires complex real-time reasoning that can generalize across changes in the environment. To tackle this we utilized two techniques; our foundation world model ES-1 and reinforcement learning (RL) in simulation. ES-1 provides an agent with concepts needed for outdoor work; and has been trained from our robot mowing fleet on thousands of diverse properties. Given this robust representation we can then perform RL- which teaches the agent to solve a specific new task. We found that RL on top of our world model can 1) enable simulation based training of new policies such as string trimming and 2) be generalized to work on a diverse number of sites across the country.
Hardware Design
Verdie is designed to be both highly mobile and support generic power tool attachments for outdoor work. For the mobile base, we wanted to have something that can handle challenging 3D terrain like going over curbs, has a high top speed, and is low cost. Inspired by designs like Ascento , we chose an articulating wheeled robot base. Here the robot has actuation in multiple parts of its legs, which enables 6-DOF control of a power tool on a usable manifold.
Sensing on Verdie is primarily driven by stereo cameras. Inspired by nature; stereo cameras enable robust reasoning of geometric features through multi-view geometry, but also allow high bandwidth visual information. We have found throughout ESR’s product development, that this provides a great low-cost representation for embodied AI to understand the world.
Given the mobile base and sensing, we have successfully attached tools that perform work on the properties we currently service. Power tools have recently seen rapid progress in the amount of available electrified alternatives to the traditional two-cycle gas equivalent. Electronic power tools can be controlled by our platform while powered by their native battery with a few simple modifications. .
It is worth noting, these tools are still less efficient than traditional gas powered equipment- which matters when optimizing for human labor costs and has led to resistance in industry adoption. However, with embodied agents like Verdie, we can now deploy emission free tools at scale since automated work is significantly cheaper.
Teaching Tools
In policy learning there are two common techniques; imitation learning and reinforcement learning. Imitation learning learns a policy by directly recording demonstrations and trying to fit a function to mimic the trajectory distribution. Unlike a humanoid; Verdie has a unique set of kinematics that make it unnatural for a human to tele-op. We have thus been interested in Reinforcement Learning (RL), where the agent self-discovers how to perform a task through optimization. We believe this is a way for Verdie to discover policies that exploit its unique dynamics.Â
RL comes with significant drawbacks. First, a trained policy can struggle to generalize to new instances of the task that are outside of training data. Second, specifying a reward is challenging in the real world because it can require oracle-like information, such as metric distance to the edge of grass.
We address these challenges by building on top of our foundation model for outdoor work, ES-1. ES-1 consumes time series information and predicts a relevant state for performing outdoor tasks, such as; a Birds Eye View map of the world, obstacle detection, semantic understanding of a workspace, and robot pose. The model is pre-trained in simulation and then fine-tuned on the variety of corner cases found running our fleet of robotic mowers. This training provides us with a robust feature space that is invariant both in task and between simulation and reality.
For a specific example of this- we consider the task of string trimming, and train a small fully-connected policy on top of the learned embeddings using RL. The policy takes as input time-series data via ES-1’s projected embedding space, then outputs velocity commands for the motors to track at 10 hz. We used NVIDIA’s ISAAC simulation for all training. The reward function was set to follow the perimeter of the property with the tool tip of the trimmer. Since our models are already designed to run on Jetson platforms, we can train the entire policy on a single desktop GPU.
We can then deploy the trained policy on properties we currently service. Initial results demonstrates surprisingly robust transfer to a variety of lawns. Verdie is still being tested to verify it can be safely operated on crews, but signs are looking promising. For more clips of Verdie using a string trimmer, check out the video here.