Large Action Models (LAM) are the most exciting development in AI evolution since ChatGPT was launched. Having an AI assistant not dependent on islands of apps that do not integrate with each other is everything we’ve been crying out for… But the future potential of LAMs is a lot bigger than addressing this burning problem plaguing our smartphone lives – it also has significant implications for the future of enterprise tech. It’s just incredible that while Google, Meta, Microsoft, et al. are all working on the evolution of LLMs toward actions and problem-solving, a startup like Rabbit is allegedly ahead of them.
Welcome rabbit.tech and step forward Jesse Lyu, who could well be the new Chinese Steve Jobs.
Source: YouTube, 2024
His introduction of a proprietary LAM makes it possible for AI systems to see and act on apps in the same way humans do. They learn through demonstration – watching what a person using an interface does to replicate the process – even if the interface changes. LAMs learn the interfaces from any software. They solve the problem of islands of apps that would not otherwise integrate.
The most impressive launch since Steve Jobs revealed the iPhone in 2007
Models for controlling computer actions are significantly less mature than language models, which is likely why Rabbit.tech caused such a stir at the recent 2024 CES event. What Lyu is developing with his firm Rabbit could totally disrupt the app store in a similar fashion to how ChatGPT is disrupting web search. This is super impressive in our view (except for Lyu’s love of Pizza Hut and Rick Astley). Even Microsoft CEO Satya Nadella has described the Rabbit’s launch of its R1 hardware as the most impressive since Jobs’ historic iPhone launch in 2007.
Real innovations are driven by consumer needs to begin with – which eventually find their way into the enterprise. The web and the smartphone are just two clear examples. Both have created the world in which we now live – a world of wanting instant gratification from our tech. And this next wave of AI is all about… making things work here and now.
LLMs understand what you say, LAMs get things done
We like the premise that “ChatGPT is great at understanding your intentions but can be better at triggering actions.” LLMs understand what you say… LAMs get things done. We have yet to produce an AI agent as good as one in which users simply click the buttons. We must go beyond a piece of complex software. In the case of Rabbit and its R1, a large language model (LLM) understands what you say, and the LAM actions your request. This model understands and enacts human intentions on computers and any user can teach it new skills.
Rabbit has applied this to the many applications sitting on your smartphone. The R1 device attaches to your phone and uses a camera and GPS to provide context for its decision-making and actions. You can use voice to ask questions and get voice and text responses. With the support of the LAM, you can ask ‘for a ride home,’ and Rabbit will use your preferred smartphone app to make the booking – understanding where you start the journey from.
Source: HFS and DALL-E 2024
Rabbit delivers outcomes through dialogue – in an ‘out-loud’ conversation, the likes of Siri and Alexa cannot match. However, the real breakthrough happens when you need a range of apps to solve your challenge – such as booking a vacation. Rabbit can respond to and fulfill complex requests such as ‘Book me a vacation in London for two adults and a child and find us a great hotel in a central location.’
And you can enter into a dialogue with it. At this point, you are effectively in conversation with it, having a conversation to hone – for example – your vacation itinerary. This implies a level of sustained memory more akin to that found in ChatGPT than in voice interfaces to date – including Alexa and Siri.
Once that vacation itinerary is honed to your liking and reported back, it takes just one human click-to-confirm to trigger the tech to complete all the bookings required – and pay for them.
The R1 ideal could sound the death knell for today’s smartphone, app stores, and even RPA… anything with needless complexity that prevents getting things actioned at the click of a button or simple voice action.
R1 wants to be everything to the user across IoS, Android, and desktop. The issue is that all these apps have a user interface. Its LAM can learn interfaces from any software. Though LAMs are not designed to replace your phone – they will eventually make it obsolete in its current form. R1 is aiming to kick off a whole new generation of native AI-powered devices and is just getting started; for example, this year, we can also expect the Humane AI Pin and the Tab AI Pendant.
LAMs effectively make it simpler to get stuff done, cutting through the needless complexity legacy applications have saddled us with. Robotic Process Automation (RPA) may allow us to stitch software together to form a process to complete an outcome, but RPA breaks the moment you change one app or interface in that process. With a LAM, the idea is you can just teach the new process through demonstration, and you get to continue getting stuff done.
Can it really be that simple? A note of caution before we pop the champagne
Should we really be ready to celebrate so soon? HFS’s Tom Reuner sounds a note of caution. “The big claim for LAMs is that they can action things. My suspicion is that LAMs require a high level of standardization for their actions. Therefore, we remain some distance away from objective-driven AI and automation that future large models may yet bring.”
In addition, while we believe that LAMs will eventually be a game-changer, specific to R1, we have a healthy dose of skepticism about whether another device is required for this functionality and whether or not consumers will appreciate carrying another device in their pockets just to save a couple of taps on their phones. The mobile revolution has been about device convergence all along. And what will prevent Google Assistant and other established assistants from improving their NLP and getting plugs for apps for similar functionality so we can just use our existing devices?
The Bottom-Line: Even if this bunny turns out to be a turkey – you need to prepare for the impact of Large Action Models
Like other AI there are risks to consider – will it comply with data and privacy rules and concerns? How many eggs do you want to put (to mix our metaphors) in the Rabbit basket? Is the device even going to show up and work (if not, there’s a bunch of HFS analysts who will be wanting their $199 bucks back). The answers for Rabbit will only come when the first consumers start getting their hands on the device. That’s expected to be early April (or, as Rabbit quips, ‘in time for Easter’).
And, let’s face it, enterprises are decades behind waking up to the need for actions so that AI can then actually do a better job, such as documenting a process, mapping a process (mining) automatically, reusing assets, securing them, and ultimately, solving the API/RPA conundrums. But when we start experiencing the end of application dysfunction in our consumer lives, surely this mindset will eventually trickle into the enterprise as we embrace all the wonders and anxiety of today’s emerging AI technologies.
But even if the device is a failure, the LAM genie is out of the bottle. Rabbit’s iPhone moment will inspire more investment to drive forward the maturity of models for controlling computers at an ever-increasing rate. And if the arrival of the R1 device does define the moment the great leap forward happens, then it will have ramifications for how work gets done in every app, in every process, in every enterprise. Either way, this is not a moment you can afford to ignore.