A RAVE and starvation synth based generative sonic device powered by dye sensitized solar cell

From Hackteria Wiki
Jump to: navigation, search

Abstract

This article introduces a self-powered experimental sonic interface that uses a dye-sensitized solar cell (DSSC) as both a signal receiver and an energy source. The device incorporates a starvation circuit, with its output enhanced by a generative AI tool—RAVE (Real-time Audio Variational Auto-Encoder). DSSCs are not only useful for energy harvesting but also function effectively as optical receivers. Previous experiments and documentaries have demonstrated the use of solar panels as "photophones", a telecommunications device that allows transmission of speech on a beam of light. Additionally, unlike photoresistors (LDRs), which typically respond to light intensity along a single dimension, DSSCs—as planar devices—can handle two-dimensional light input. This makes them particularly suitable for laser-based audiovisual performances. Additionally, DSSCs are a DIY-friendly photovoltaic technology that can be fabricated at home. Their optical and electrical response characteristics can be customized through the dyeing and screen-printing steps of the photoelectrode fabrication process. As a result, the light-to-sound conversion behavior can be tailored for specific compositional purposes or designed to interact dynamically with the RAVE system.

What is Real-time Audio Variational auto-Encoder (RAVE)?

The Real-time Audio Variational auto-Encoder (RAVE), developed by IRCAM, is a real-time generative audio tool based on the Variational Autoencoder (VAE) framework. VAEs are a class of generative models in machine learning that learn probabilistic representations of data, enabling the generation of new samples similar to those in the training set. RAVE stands out for its fast, high-quality audio waveform synthesis, offering a lightweight alternative to other computationally intensive models. Functionally, it operates like a "style-transfer vocoder", delivering real-time results with impressive fidelity. Its accessibility makes it artist-friendly: RAVE can be run via the nn~ object or as a VST in Max/MSP and Pure Data, as well as in Python environments. Another key advantage of RAVE is its open latent space, which allows users to manipulate encoder and decoder variables directly. This provides fine-grained control over the generative process, making it a versatile tool for customized audio synthesis and export.

For art

One reason for using RAVE is its potential to serve compositional purposes. It's a quick way to gain high-fidelity sound with grooves generated from the starvation synth circuit. By this way, it is easier to design musical sequences or grooves with the concept of master clock and square wave defined by the resistance and capacitors in the starvation synth; therefore, the generated organic grooves could be applied to high-fidelity output: for example, low-frequency noisy pulses can be mapped to the sound of a jazz drum set, while more harmonious drone-like signals could be rendered as violin tones. In fact, optimal results often arise when the input and output audio features are dissimilar, allowing the model to fully explore its generative potential. The groove of the starvation synth can be "tuned" by adjusting the circuit components—such as resistors R1, R2, R3 and capacitors C1, C2, C3—to better match the desired behavior of a specific RAVE model. This approach creates a meaningful bridge between hardware design and AI-driven audio synthesis. Furthermore, the unique, handmade characteristics of DSSC photoelectrodes introduce additional variables into the system. These optical and electrical variations enrich the sonic palette, enabling more complex and expressive results when combined with multiple modules or RAVE instances.

RAVE provides access to control over the latent space. The encoder and decoder objects, when combined with streamlines in Max/MSP and Pure Data, can enable more complex synthesis by interacting with the unpredictable output of the starvation synth. This opens up possibilities for exploring more implicit patterns between light input and audio output, both electrically and sonically, which could be further developed through ongoing research.

For the philosophy of mind

Another motivation to implement generative AI for art sculpting may be more metaphysical and speculative. Some researchers draw analogies between human imagination and memory and the structure of variational autoencoders (VAEs), or attempt to conceptualize the "mind" through the architecture of generative AI. For example, in From Deep Learning to Rational Machines, Cameron Buckner reexamines the philosophical debate between Fodor and Hume through the lens of generative AI, drawing parallels between human cognition and machine learning systems. Similarly, in How Deep Is the Brain? The Shallow Brain Hypothesis, Mototaka Suzuki et al. propose an alternative "shallow learning" algorithmic model that is more powerful and energy-efficient than traditional deep learning architectures. These perspectives inspire my approach to bridging generative AI with artistic systems, particularly optical systems, and expand my interest in connecting philosophy of mind with phototronics.

How to use RAVE

There are two ways to run the RAVE model in your digital audio workstation (DAW). One method is using the nn~ object, which works in both Pure Data and Max/MSP. The other method is by running Neutone with a VST plugin inside Max/MSP. Other DAWs such as Ableton Live and Logic Pro are also supported, although I haven't tried them yet. RAVE (Real-time Audio Variational autoEncoder) is a specialized generative AI model developed by IRCAM for real-time neural audio synthesis and style transfer. Neutone is a platform and plugin system developed by Qosmo that allows various machine learning models—including RAVE—to be wrapped as real-time VST3/AU plugins for use in DAWs like Ableton Live, Logic Pro, or Reaper. Many ready-to-use models are freely available for download via the Neutone app.

In addition to the off-the-shelf models available on Neutone, creating your own RAVE model is also relatively accessible for artists and designers. You can train your own model using Google Colab, a cloud-based platform that provides an organized environment for writing and executing code. Colab offers a semi-graphical interface and supports collaborative coding, making it easier to run Python scripts—especially for machine learning and audio synthesis tasks—without needing to set up a local environment. Thanks to a friend of mine, Jimi Mased has wrote a clear step by step manual: File:ABAO-gets-3-hrs-of-audio-RAVE-going.pdf. However this is still a bit difficult for non-programmer.

nn~

For nn~ installing, please follow here.

Neutone

  1. For neutone installing, firstly go download the app at the official site.
  2. After downloading, open max/msp and find neutone VST at the left bar menu. Drag the VST to the center of the patch window.
  3. Click the icon of wrench on top of the VST window to open the neutone app interface.
  4. Instal the ready trained model to VST by clicking the "USE" logo in the neutone interface.
  5. Add the ADC~ and DAC~ object to the VST object.

Experiments

Four different setups were tested here: 1 and 2. Laser projection + DSSC + RAVE. 2. Starvation synth circuit + crystalline silicon solar panel + RAVE. 3. LED + crystalline silicon solar panel.

Video 1 and 2. Laser projection x RAVE

The experimental setup in video 1 and 2 both consists of a DSSC and a galvanometer laser projector with a 405 nm laser source. In the first video, the projected laser patterns are generated base on noises in Max/MSP. The video compares the raw light-to-audio conversion results with those processed through RAVE.

In the second video, the generated laser pattern is generated with octave notes. The content of the laser projection was generated with Arduino code, it is a period of melody of Canon a 2 per Tonos of Bach. Two notes are converted into frequencies and sent to the x and y galvanometers at the same time as polynotes, so you can "see" and hear one polynote in one laser pattern simultaneously. The reason I chose Bach’s Canon a 2 per Tonos is that I appreciate how Douglas Hofstadter used this piece in his book Gödel, Escher, Bach: An Eternal Golden Braid (GEB) to illustrate the concept of a “strange loop,” which he proposed to explain the paradoxical structure of the mind. The Arduino code of the laser galvanometer can be found here.

For example, the behavior of the x and y galvanometers are defined in the arduino code:

// Define the main motif of the canon (C4 = 261.63 Hz) float notesX[] = {

 261.63, 293.66, 329.63, 261.63, 329.63, 392.00, 
 349.23, 329.63, 293.66, 261.63, 329.63, 261.63

};

// Define corresponding Y-axis notes for harmony or static projection float notesY[] = {

 196.00, 220.00, 261.63, 196.00, 261.63, 293.66, 
 261.63, 220.00, 196.00, 220.00, 261.63, 196.00

};

Video 3. Starvation synth circuit x RAVE

This video demonstrates the raw audio output and RAVE model filtered output of a starvation circuit connected to a crystalline silicon solar panel. The amount of receiving light of the solar panel is altered by hand shadows and extra light sources. In this test the starvation synth is toned to lower frequency with slower grooves that is suitable for jazz drum set model.

Video 4. Photophone made of LED and solar panel

Besides using laser as input, LED can also be used as transmitter to send out audio datas. During the preparation of a solar workshop organized by Diana band 다이애나밴드 (a sound art duo Wonjung Shin 신원정 and Dooho Yi 이두호) Dooho made a rapid prototype to send audio stream with a blue LED, a conventional crystalline silicon solar panel is used as receiver and oscilloscope. However, there is no RAVE model involved in this setting.

{{#widg

References

  1. Caillon, Antoine, and Philippe Esling. 2021. “RAVE: A Variational Autoencoder for Fast and High-Quality Neural Audio Synthesis.” arXiv:2111.05011. Preprint, arXiv, December 15. https://doi.org/10.48550/arXiv.2111.05011.
  2. Buckner, Cameron J. 2023. From Deep Learning to Rational Machines: What the History of Philosophy Can Teach Us about the Future of Artificial Intelligence. 1st ed. Oxford University PressNew York. https://doi.org/10.1093/oso/9780197653302.001.0001.
  3. Suzuki, Mototaka, Cyriel M. A. Pennartz, and Jaan Aru. 2023. “How Deep Is the Brain? The Shallow Brain Hypothesis.” Nature Reviews Neuroscience 24 (12): 778–91. https://doi.org/10.1038/s41583-023-00756-z.
  4. Nyangoma, Kato Ainomugisha. n.d. 2024. Sonic Art: Exploring the Relationship between Sound and Visual Arts.