It wasn’t planned, but this latest in my three-part series on the potential implications of AI’s rapid rise for how we work with computers turns out to be timely, given Anthropic’s recent announcement that Claude 3.5 Sonnet now has the ability to interact with PCs very much like a human would. It’s a great concrete step along a path to what I want to outline here: A hypothetical working world in which direct computing as a paradigm has been entirely abstracted away.
Layers of abstraction
The first computers took bit-flipping quite literally, with physical switches requiring direct manipulation to transmit information into a hardwired program in order to effect a calculation. Major shifts in how we create instruction sets for computer hardware, and how we provide it with the information we want it to use, transform or manipulate have resulted in additional layers of abstraction on top of that core ‘metal’ flip-switching mechanic – overwhelmingly in the direction of making it more powerful, more performant, and easier for people without any deep knowledge of how computers work at a fundamental level.
There are already early signs in live examples of AI-based products that we’re moving to a new layer of abstraction above the level of the desktop operating system and the Graphical User Interface (GUI). Recent Medal spin-out Highlight is a great example, and Claude’s new tool is another, with a different angle but a similar step change in the level of abstraction away from machine instruction.
The broad arrow of progress in this regard adds intermediation layers between the silicon that powers our computing devices and the work we want them to do – but at the same time, it disintermediates between our thoughts and those actions. AI-based interaction tools like those just mentioned are an even simpler expression of the fundamental function of a computing tool, which is to return a result to a query that closely matches our expectations of what we want in a perfect outcome.
Speak it into existence
The current focus on agents, or AI that can actually take action on our behalf, rather than just collect, distill and remix knowledge, is a key ingredient in transitioning to a new paradigm that mostly foregoes direct manipulation of a computer interface.
Multi-modal models are another key ingredient, as are interface concepts that go beyond tactics used in legacy search (text and image-based queries) to do things like what SocialAI accomplished with one-to-many chatbot ‘social network’-style engagement. I think that all of these together are pushing the state of the art when it comes to a world in which we essentially can think aloud and have agentic AI – either through a single, general purpose model, or via a collective of more tuned task-specific models – determine what elements of that spoken thought need action, and then act on them.
Intuitively, we know that such a system will have to be probably 99% reliable on what it does action to be trusted (and early on at least, have a strong bias towards inaction in case of any uncertainty). There’s also good actual evidence to support that assumption, because we have the real-world example of Amazon’s Alexa-based voice shopping tools, which failed to gain any significant scale when it comes to real-world adoption.
But if we can get to a point where agentic behavior is mostly predictable, and near-perfectly reliable, then that opens up the possibility for significant change in the way we work – as detailed in the next section.
In the interim, I expect the collection of input to task agentic AI to be more overt, like an avatar joining a virtual meeting or dedicated hardware that alerts users to when it’s paying attention via notification lights or audio cues. But as we become more comfortable with this way of working, I think that’ll become less important, and people will accept that it’s essentially an invisible layer that pays attention when it needs to, and ignores when it doesn’t.
Meetings that actually matter
‘What if meetings were useful?’ is so stale a trope that it’s almost written off as a meaningful place to expend effort or technological innovation. But omnipresent AI – and specifically, AI that can do work on our behalf – has the real possibility to make meetings the site of real-world work getting done, rather than a staging ground of idea generation that then has to be transcribed through a very lossy process into ‘actual’ work.
If intent, decisioning and action items can be directly captured and then immediately actioned from the content of the meeting itself, via AI transcription, interpretation, prompt generation and agentic AI deployment, then what’s possible with knowledge work done in group settings changes drastically.
When I was at Shopify, meetings were always viewed with a high degree of suspicion by senior management, and Tobi Lütke (CEO and co-founder) in particular. I was there when he first introduced the semi-regular practice of wiping out all meetings company-wide (something they still periodically do to this day) and while it wreaks havoc on sales and partnership teams, it’s a surprisingly effective culling function for internal meeting overgrowth.
I think in a world where meetings can translate intent into action, it’ll become much easier to determine which are and aren’t valuable – and as a result, meetings may even end up becoming a primary work surface, blending strategy and execution into one seamless whole.
The retro-future workplace
For knowledge work in particular, I think we may see drastic changes that result in terms of how working environments are designed and configured. The modern office is some variation of a ‘computer farm,’ featuring desks equipped with either PCs – or, increasingly, monitors, docks and input devices that can accommodate whatever notebook computer a user brings with them.
The trajectory over time has been from more specialization and fixed configuration to more flexibility and generalized use; with ambient AI that can also act on our behalf, I imagine that environment, also enters a new level of generalized abstraction which, paradoxically, more closely resembles office workplaces from the past than those of more recent history. Without the need for direct input and freed of responsibilities for note-taking, I foresee a knowledge work environment where all the emphasis is on creative interaction between individuals, with minimal intrusions on those interpersonal dynamics from things like laptops, screens and obvious devices.
I’m not talking about a wistful and misplaced nostalgia for a Mad Men-esque mid-century lounge complete with whisky and cigars to hand – but the elements of that approach that stress the absence of any intermediation for the people doing the work will be prevalent. And assistive tech, like screens to provide visuals when needed, will be more naturally integrated into the environment and recede into the decor when not in use.
As computers become more powerful, I think they will also necessarily become less obtrusive – to the point where having to ‘use’ a computer will seem as clumsy and quaint as having to hand-crank an engine to get your motor car up and running.