Thoughts on generative music
Observations from a year building generative music tech
Having worked on generative music at Udio for the past year, I wanted to record some thoughts. For context, when I say generative music, I’m talking about generative models which produce music as an output, controlled by inputs which might include text, other music, MIDI etc. There are lots of other cool ways people are integrating machine learning models into modern music technology, including production, understanding, and recommendation, but in this piece I’m focusing on the generative flavor.
Current State & Technical Reality
- These models are incredibly fun to use.
- The full potential of these models hasn’t nearly been realized yet (one imperfect analogy: imagine if LLMs had an artificial cap on their response quality).
- Building anything adjacent to the music industry is hard (to the surprise of absolutely nobody).
- This is not unique to generative music, but the product overhang relative to the tech advancement is substantial–this relates both to modes of interaction (e.g. text is actually a surprisingly weird way to prompt a music model, and many people taxonomize music by artist, album, or just how it sounds), as well as UI/UX. ‘A tech demo is not a product, no matter how good the tech is.‘
Key Opportunities
- The first product area I’m excited about is social media and mobile, and developments here might look like a flavor or evolution of edit culture–you can see baby steps here with artists releasing slowed-down, sped-up, a cappella, instrumental versions etc. of tracks, and e.g. Apple music allowing real-time control of vocal volume–malleable, personalized music experiences are on the way. I’d love to see mobile-native music-creation become a thing beyond voice notes (think CapCut for music).
- The second area I’m excited about is creative tooling–these models will be some of the most interesting co-writers you’ll ever have, and this use case is already gaining traction (you won’t need these models to make music like you don’t need LLMs to write code, but many will likely be more productive with them once they’re packaged correctly, likely including DAW integration).
Impact on Creation & Consumption
- If anyone can press a button and generate an average radio-listenable track, there’s no value in that anymore, but that’s fine. Art always expands to fill the space, and by definition you can’t commoditize it. What we consider ‘good’ music maintains its scarcity even as production becomes easier–the bar for what’s considered exceptional simply rises with our capabilities, since musical excellence isn’t about reaching and automating a fixed quality-standard, but about creating something that uniquely resonates with its time and audience.
- Generative music, like lots of other generative tech at the moment, raises both the floor and ceiling in the medium–there’s basically zero barrier-to-entry for beginners, and professionals can accelerate ideation, explore variations, or suggest alternative or supporting elements.
- If generative music tech provides increased leverage for music-making, human taste and curation will become even more important. Artists still need vision and an understanding of what they want to do and what works.
- Streaming services aren’t going anywhere–people will still want all the best music in one place, and if they like it they won’t care how it’s made.
Integration & Evolution
- Music will naturally incorporate generated content across a variety of use cases, and you might not notice without being told, much like you don’t necessarily have insight into writing, instrumentation, recording, or production choices right now. Look at BBL Drizzy, Metro Boomin’s subsequent sample, the cultural phenomenon that followed, and how it finally ended up on Sexyy Red’s ‘U My Everything’ with a Drake verse on top (this was not something we envisioned when building the first Udio model). Another example is JPEGMAFIA’s sampling of a generative cover of Future’s ‘Turn on the Lights’ in his track ‘either on or off the drugs’.
- Generative music will be looked back on as another step in the evolution and expansion of music over the past century, including but not limited to recording, electric and electronic instruments, sampling, digital production, streaming etc.–it’s easier than ever to make music today, there’s more music than ever before, and the pie keeps getting bigger.
- The industry will find ways to accommodate new tech that people want—it’s hard to put the cat back in the bag. Naturally any evolution in this space will need to respect artists’ rights over their likeness and creative identity.
Future Trajectory
- I think generative music will be ubiquitous by the end of the decade.
- It will be interesting to see whether streaming services expand to accommodate a more interactive music experience, or whether that remains predominantly on short-form content platforms (TikTok, Instagram). Maybe the final evolution of the streaming service (utopia or dystopia depending on your attitude) is a single ‘play’ button that knows exactly what you want at that moment, and can make it for you.
- We might see the line between fan engagement and creation blur - there’s a world where an artist’s next hit emerges from the most popular fan-generated track using their voice or style, with their blessing and involvement. Music might become an ongoing conversation between an artist and their fanbase instead of consumption of a fixed discography.
- The ‘AGI’ version of this tech probably looks like a real-time system that can sit in on a jazz improv. session and hold its own–we’re far from that but there’s no reason it won’t eventually exist.