Avo

A short film about the studio and the work, by Jack Schulze and me.

Avo is television and a video game at the same time, in the same frame, on a phone you hold in your hand. The story is held inside a filmed world, with the full cast and crew of a children’s TV production, and a 3D character lives inside that world. The player controls the character. The actors on screen respond to what the player does, where the character goes, where the player draws. Billie watches Avo. Billie reacts. The whole story is built around the responsiveness of the filmed world to the player’s hand.

Gameplay still: Avo, a small 3D avocado character with dot eyes and stick limbs, dances on a workshop bench beside Billie, who watches him from across the desk. A dashed path the player has drawn across the wood loops through a trail of glowing pink jellybeans.
A still from one of the episodes. The avocado, the dashed path the player has drawn, the glowing jellybeans, and Billie watching from the other side of the bench.

I co-founded Playdeo in 2016 with Jack Schulze and Nick Ludlam to ask whether video could be made playable. Not video on top of a game, the way a cutscene sits on top of an engine. Not a game on top of video, the way a level can run inside a YouTube clip. Not a TV show with a phone-vote sidecar, in the manner of X Factor. Not a film with branching cuts, in the manner of Bandersnatch, where the only choice the viewer ever makes is which next clip plays. Something else. A filmed, populated, lived-in world the player could be inside, with the cast and the props of a real production already in it, the actors performing for where the player’s character was going to be.

As Jack put it in the studio’s introduction:

Playdeo is making television you can touch: a new space, inside video, where you can play. – Jack Schulze (2019)

The premise was that video on a phone is still being treated as a tiny TV, and that the phone deserves something native to it. The care and heart of film and television, the interactivity and play of games and software, on the only screen most of us look at all day.

The player taps and drags the avocado across a filmed scene. The character has weight and shadow, sits behind objects, occludes others, follows a path drawn across the floor. As the character moves, the runtime cuts between covered angles based on where Avo is, where the player’s finger is, and where the action is going. Touching a glowing jellybean collects it. Solving a spatial puzzle, lighting something up or getting something into a place, opens the next sequence. Off-piste exploration is rewarded. The pace alternates between leaning in to play and leaning out to watch.

Billie rests her chin on her arm at a workshop bench, watching Avo, the small 3D avocado character, who is balanced on the rim of an enamel mug. An orange ducting and a glowing white sphere are behind them.
Billie watching Avo on the workshop bench. The avocado is a 3D character; everything else is filmed; the performance is for where the character is going to be.

The story follows Billie, a young scientist with a workshop, who has brought Avo to life and is trying to recover her stolen inventions across eight episodes. We shot the episodes the way you’d shoot a children’s TV series. Real set, real actors, real props, real lighting, real sound. A full production crew, a full cast, scripts, table reads, blocking, shoot days, post. The game lives on top of that footage, frame-synchronised to it, with the avocado composited in and the actors playing toward where the character is going to be.

The story

Avo is silent, in the lineage of Wallace and Gromit. The world around him is a workshop full of jars and oscilloscopes and homemade gadgetry, with the sensibility of Blue Peter, Tomorrow’s World and The Crystal Maze rather than the aggression of most contemporary games. Around Billie there is a small ensemble of recurring characters: a Father, a Mother, a treasure-hunting rival called Roberto, a girl called Storm. The original title for the show was Billie and Avo; we shortened it to Avo shortly before launch.

Each episode is a chapter of the same long story. Billie has been working on a set of inventions; the inventions have been stolen; she and Avo move from set to set to set in pursuit. The puzzles inside each chapter are the same puzzles a working inventor solves on a working bench. Get the right thing into the right place. Light the room. Find the part you dropped. The player participates in those by drawing the path Avo walks across the bench, by tapping at things in the world, and by directing Avo’s attention.

The script was written by Ryan North and Gemma Arrowsmith. Katy Reece plays Billie Hopper, the inventor. The score is bespoke, written in 5/4, with a different theme variation and a separate ‘mission’ cue for almost every chapter. The sound design is full surround. The whole thing is built to the standard of a serious children’s television production, on a production budget closer to a short film than a Netflix episode, in what Nick later called the guerrilla, forgiveness-not-permission school of filmmaking.

Dan Hill noticed something that I had not consciously thought about while we were making it. Billie Hopper is a young woman who is fluent and excited about code, who builds, hacks, and adapts. Bandersnatch came out two months before us, and at its centre is an isolated, paranoid, male coder. Hill saw Billie as standing in a long line of women in computing, from Ada Lovelace through Grace Hopper to Katherine Johnson, and read Avo as quietly arguing for an optimistic, humane view of technology, against the cynical one. We did not write Billie Hopper to be that argument, but we are happy that the argument is there.

Where it sits

Marshall McLuhan wrote that new media find their form by emulating the old before they grow into themselves. The first television shows were a camera in front of a radio presenter. Early cinema filmed theatre productions. Pong was table tennis. Each medium worked through what it had inherited before it understood what it was.

Most attempts at interactive video so far have stayed inside the inherited form. The branching film, where the choice is which clip plays next. The game with a video chrome, where the video is decoration. The TV show with a phone-vote sidecar, where the device is a remote control that happens to be in your hand. Avo is in none of those boxes. The video isn’t a backdrop. The actors aren’t waiting for a vote. The choices the player makes aren’t pre-cut and aren’t off-screen. The player is inside the shot, with their finger on the floor of it, and the cast is performing in response.

The closest near-relative I can think of is Monument Valley by ustwo. Both came out of design studios rather than from the games or television industry. Both treat touch as the only sensible input. You can’t imagine playing either with a controller or a keyboard, because the medium is the touch. Hill makes this point in his essay, and it stayed with me as the most useful frame for what Avo was actually trying to be.

The other lineage is closer to home. Jack and I had spent the years before Playdeo at BERG. The work we made at BERG, with Matt Webb, Matt Jones, and others, was characterised by what Jack later described as small objects operating in the big world. Mag+ was a digital magazine that took the page seriously rather than throwing it out. Suwappu was a set of toys that knew their own context. Availabot was a presence-indicator USB doll. Little Printer was a connected device whose entire personality fit on a curl of receipt paper. Avo is on a phone screen rather than a desk, but the family resemblance is direct. A character with a face, in a hand-built world, doing recognisable small things.

Hill widens the lineage further into a particular British strain of inventive making: Trevor Baylis with the wind-up radio in his shed, Emily Cummins’s solar evaporation refrigerator, Terry Gilliam’s collage animations, the high-tech architecture tradition of Rogers and Foster, the inflatable dummy tanks and Maunsell sea forts of wartime improvisation. The thread is that constraints make a particular kind of design possible. Most of Avo was made with one camera, one set at a time, very few people on the cap table, and an iPhone in everyone’s hand. That kind of constraint forces a different shape.

You draw to control Avo. This is drawing as a way of making the world, at least for this little avocado. Sketching a dotted line in front of a character is both the most natural thing to do, in many ways, but also further draws the player/viewer into actively constructing the world. – Dan Hill (2019)

Hill frames the player experience as a third state beyond the lean-back of television and the lean-in of video games. A “be in” state, fingertips and pixels, with cut-scenes built in to let the player breathe. He cites Henry James from 1912 on form being substance, and argues that the form invention in Avo is itself the substance, whatever the story turns out to be. I think he’s right.

How it works

Conventional mobile AR puts a digital object on top of the live camera feed. Avo puts the character inside a recorded cinematic shot, and aligns its motion with the virtual camera that filmed the shot in the first place. Three things had to come together at once. A clean handoff from the real camera’s motion to a virtual camera in Unity. A video pipeline that could keep frame-perfect timing under the player’s interaction. And a 3D scan of every set so the character could collide with floors, hide behind chairs, and throw shadows on real furniture.

The shoot rig is the place to start. Each set was a constructed world, intricate as a working bench, lit and dressed for camera. Small enough to fit on a desk, large enough that depth of field became a real problem. Early prototypes were shot with shallow depth of field, which looked beautiful and broke the camera tracker. We learned to dial the focal plane to a modest depth that the tracker could follow reliably across re-shoots, with consistent set markers and consistent lighting. Each scene was filmed multiple times from multiple angles. What we ended up calling ‘coverage’ shots, for the moments when the player is drawing Avo’s path; and ‘sequences’, for the narrative beats. Camera solving (recovering the precise position of the physical camera on every frame) was done in PFTrack.

Early prototype: a small metal pencil sharpener with cartoon eyes and a mouth painted on, sitting on a desk. Out of focus behind it, Timo is gesturing at it with one hand.
An early prototype, before there was an avocado. A pencil sharpener with eyes was good enough to test camera tracking, character scale, and shadow.

Photogrammetry of the set gave us collision geometry. We walked a phone around each tableau, took several hundred overlapping stills, and reconstructed the geometry in Reality Capture as a low-poly mesh, aligned to the filmed plate. The character runs on that mesh. Shadow direction was matched by hand from the lighting on the day.

Reality Capture reconstructing one of Billie’s workshop sets from a few hundred photographs. Each white marker floating in space is the position from which one of those still photographs was taken.

Frame timing came out of a custom video plugin Nick wrote against AVFoundation, which let us drop into any frame on demand and treat the video as a seekable, addressable medium rather than a tape running forward. Yera Diaz built the Unity editor side, in September 2016, so the team could scrub clips frame-accurately while authoring without having to make an iOS build to see anything.

The biggest single piece of plumbing was something Nick called the dual decoder. The naive way to make a video seekable is to ask the OS to seek, but iOS’s audio-aware decoder paid a cost of around 0.2 seconds for every audio-bearing seek, which is forever in interactive terms. Nick’s answer was to run two decoders against the same MP4 in parallel. The current clip plays on one, the next clip is being prepared on the other, and the runtime swaps between them at the cut. Adjacent clips in the file are detected and played back-to-back on the same decoder so the swap doesn’t have to fire. On older devices that couldn’t sustain two decoders, the second one was disabled and the cuts just took a perceptible beat. The trick is in the patent.

The other piece was the PathEDL system. The player draws a path with a finger; the runtime walks ahead of Avo on that path and pre-simulates which covered angle should be on screen at each beat. Filters apply: a minimum start-zone time, so cuts don’t fire on the first frame. A minimum-edit-length filter, so we don’t get a sub-second flash of an angle and then go back. A screen-edge filter, so the player can’t end up looking at Avo from a position that traps him at the edge of frame. The pre-simulation lets the runtime cut on intent rather than on collision. Hill called this ‘attention-seeking cameras’ in his essay; the cameras are competing for the right to be on screen at every moment, scored on a function-versus-aesthetics basis, and the highest score wins.

Editing was the hardest part. A clip in the linear medium has one job, to get from start to end. A clip in Avo has to be reusable. The player might enter from the left or the right, complete a puzzle in any order, get stuck and need a redirect. We learned to think in shots rather than scenes, and to write the script knowing each beat had to survive being re-entered. We argued constantly about ‘video time’ versus ‘game time’. When does the recorded narrative pause for the player’s pace, and when does the player’s pace bend to the recorded narrative.

Underneath all of this is the Sequencer, an authoring tool we built in Unity that knits filmed clips, game events, audio cues and player input into a single deterministic timeline. A node spans zero or more frames; a level is a tree of nodes; a checkpoint is a saved state derived from the tree’s execution to that point. The Sequencer let level designers compose a chapter without having to manually track every door, key, switch and inventory item. The state is whatever the tree says it is.

Sound is full surround. Music and dialogue mix is run through Wwise. One of the late additions, days before launch, was a Bluetooth latency compensation layer. AirPods had become the standard everywhere we were testing, and Bluetooth audio adds 0.2–0.3 seconds of delay that the operating system reports through AVAudioSession.outputLatency. We pull that value at runtime and shift the camera and gameplay events forward by it, so cuts and sound effects line up with what’s happening in the player’s ears, not with what’s happening on the device.

What this added up to, by the end, was something close to real-time cinematography. The runtime is making editorial decisions on the fly, choosing which covered angle to cut to, when to hold the wide and when to push in, where the player’s hand is and where the action is going. Hitchcock once said that ‘image and movement, rhythm and effects’ should be subordinated to dramatic purpose, that technique should enrich the action, not flaunt itself. Daniel Arijon’s Grammar of the Film Language (1976) is the textbook on what that means in continuity editing. Avo is Arijon’s grammar made interactive. The shot list, the coverage, the eyelines, the cuts on motion, the screen direction, all of it; the runtime has to honour all of it, while also responding to a finger that can be anywhere, at any time.

Production

Three of the production crew on set in the studio. One holds a steadicam camera rig, one sits on a stool reviewing a take, one stands with a clapperboard. The studio walls are crowded with shelving full of props and tools.
On set in the studio.

Eight episodes, shot in our London studio. Filming ran for ten weeks from May 2018; for the duration of the shoot the studio was split in two, half on set and half on infrastructure, footage flowing from one half to the other as it came off the camera. Implementation of the eight episodes ran from September 2018 to January 2019. Avo launched on iOS at the end of January 2019 and has, between then and now, been downloaded close to four million times.

The three founders divided the work along the lines of our backgrounds. Jack on visual design, character design, and the studio’s voice. Nick on engineering and code vocabulary. Me on cinematography, gameplay design, VFX and post. Around that, a full cast, a full production crew, set builders, animators, a sound designer, a composer, a script supervisor, a producer, and a rotating bench of game and 3D engineers. Lotta Boman ran the production. Ryan North and Gemma Arrowsmith wrote the show. Katy Reece played Billie Hopper. Yera Diaz built the Unity authoring side. The cast and crew of the eight episodes, individually credited in the App Store entry, were the difference between what we wanted to make and what we did make.

Footage shot on a rigged DSLR with motion control on the longer takes, and on a handheld stabiliser for the looser ones. Camera-solving in PFTrack. Set photogrammetry in Reality Capture. Edit in DaVinci Resolve (we started in Final Cut Pro X and moved to Resolve as the volume of footage scaled). Conformed to a frame-accurate sequence file, ingested into Unity through Yera’s plugin and Nick’s AVFoundation video layer. Keyframe data round-tripped through Autodesk’s FBX Python library so we could move things programmatically rather than rely on Unity’s automatic smoothing. Audio in Wwise. Game runtime in Unity. Camera tracking via Apple’s ARKit. iOS builds, TestFlight distribution, signing, plugin injection, on-demand resourcing, all driven through Fastlane. CI on Jenkins, after a stint on Go CD. The data pipeline (FCPXML, Resolve CSVs, PFTrack data, FBX, video files all correlated) was about three and a half thousand lines of Python. ffmpeg holding the lower layer of the video plumbing together throughout.

The technical underpinning is patented as US10789781B2, Interactive frame-synchronized augmented video, filed August 2019, granted September 2020, co-invented by Nick, Jack and me. The claims describe a frame-number time source, a per-frame camera-solving step, a scene mesh model, dual video decoders to keep playback responsive, and an interactive layer of computer-generated objects synchronised to the filmed plate. Originally assigned to Playdeo, later transferred to Apple.

What it was

Avo launched on iOS at the end of January 2019. It was selected for the Design Museum’s Beazley Designs of the Year 2019 in the Digital category. Apple featured it as App Store Game of the Day. The App Store front page wrote of it as ‘a magical adventure with the production values of a kids’ TV show’ and that more or less covered it.

The best long read on the form is Dan Hill’s four-part essay on City of Sound, which placed Avo alongside Stadia as one of the few projects then proposing a new grammar for interactive video, against the more conventional formats of Apple Arcade and Netflix’s Bandersnatch. Game Developer ran a piece reading the show’s editing through Hitchcock, Polanski, Scott McCloud, and Arijon’s grammar. Destructoid gave it 8/10, calling it “child-safe and practically begging to be shared with all the tiny humans in your life”. Pocket-lint ran the launch announcement under the headline “television you can touch”, with Jack’s framing of the studio’s ambition. Stuff Magazine made it their app of the week and praised Katy Reece’s performance. Nick has written his own three-part account of the build which goes deeper into the technical pipeline than I have here. Jack’s introduction to the studio on Medium is the clearest statement of why we made it in the first place. Avo was also lightly trailed in the press at launch, including in Wired, The Verge, Time, TechCrunch, and a number of design and TV trade publications.

I am still working out what Avo was. A form first, certainly. A way of treating recorded video as a place rather than a thing, with the cast on screen performing toward where the player is going to be. The questions it opened up about pacing, agency, frame, attention, and the seam between filmed and rendered images are still on my desk. We were eight episodes in when we wrapped, and there were more shapes on the horizon. I’ll come back to those.


Avo was made at Playdeo by Jack Schulze, Timo Arnall and Nick Ludlam, with Lotta Boman as production manager, Ryan North and Gemma Arrowsmith as writers, Katy Reece as Billie Hopper, Yera Diaz on the editor toolchain, and the cast and crew of all eight episodes. Thanks to everyone who came through the studio.