Riffusion

Music-generating machine learning model

Seth Forsgren
Hayk Martiros

Initial releaseDecember 15, 2022Repositorygithub.com/hmartiro/riffusion-inferenceWritten inPythonTypeText-to-image modelLicenseMIT LicenseWebsiteriffusion.com

Generated spectrogram from the prompt "bossa nova with electric guitar" (top), and the resulting audio after conversion (bottom)

Riffusion is a neural network, designed by Seth Forsgren and Hayk Martiros, that generates music using images of sound rather than audio.^[1] It was created as a fine-tuning of Stable Diffusion, an existing open-source model for generating images from text prompts, on spectrograms.^[1] This results in a model which uses text prompts to generate image files, which can be put through an inverse Fourier transform and converted into audio files.^[2] While these files are only several seconds long, the model can also use latent space between outputs to interpolate different files together.^[1]^[3] This is accomplished using a functionality of the Stable Diffusion model known as img2img.^[4]

The resulting music has been described as "de otro mundo" (otherworldly),^[5] although unlikely to replace man-made music.^[5] The model was made available on December 15, 2022, with the code also freely available on GitHub.^[2] It is one of many models derived from Stable Diffusion.^[4]

Riffusion is classified within a subset of AI text-to-music generators. In December 2022, Mubert^[6] similarly used Stable Diffusion to turn descriptive text into music loops. In January 2023, Google published a paper on their own text-to-music generator called MusicLM.^[7]^[8]

References

^ ^a ^b ^c Coldewey, Devin (December 15, 2022). "Try 'Riffusion,' an AI model that composes music by visualizing it".
^ ^a ^b Nasi, Michele (December 15, 2022). "Riffusion: creare tracce audio con l'intelligenza artificiale". IlSoftware.it.
^ "Essayez "Riffusion", un modèle d'IA qui compose de la musique en la visualisant". December 15, 2022.
^ ^a ^b "文章に沿った楽曲を自動生成してくれるAI「Riffusion」登場、画像生成AI「Stable Diffusion」ベースで誰でも自由に利用可能". GIGAZINE.
^ ^a ^b Llano, Eutropio (December 15, 2022). "El generador de imágenes AI también puede producir música (con resultados de otro mundo)".
^ "Mubert launches Text-to-Music interface – a completely new way to generate music from a single text prompt". December 21, 2022.
^ "MusicLM: Generating Music From Text". January 26, 2023.
^ "5 Reasons Google's MusicLM AI Text-to-Music App is Different". January 27, 2023.

Differentiable computing

General

Differentiable programming
Information geometry
Statistical manifold
Automatic differentiation
Neuromorphic engineering
Pattern recognition
Tensor calculus
Computational learning theory
Inductive bias

Concepts

Gradient descent
- SGD
Clustering
Regression
- Overfitting
Hallucination
Adversary
Attention
Convolution
Loss functions
Backpropagation
Batchnorm
Activation
- Softmax
- Sigmoid
- Rectifier
Regularization
Datasets
- Augmentation
Diffusion
Autoregression

Applications

Hardware

Software libraries

TensorFlow
PyTorch
Keras
Theano
JAX
Flux.jl
MindSpore

Implementations

Audio–visual	AlexNet WaveNet Human image synthesis HWR OCR Speech synthesis Speech recognition Facial recognition AlphaFold Text-to-image models DALL-E Midjourney Stable Diffusion Text-to-video models Sora VideoPoet Whisper
Verbal	Word2vec Seq2seq BERT Gemini LaMDA Bard NMT Project Debater IBM Watson IBM Watsonx Granite GPT-1 GPT-2 GPT-3 GPT-4 ChatGPT GPT-J Chinchilla AI PaLM BLOOM LLaMA PanGu-Σ
Decisional	AlphaGo AlphaZero Q-learning SARSA OpenAI Five Self-driving car MuZero Action selection Auto-GPT Robot control