Compovision is a playful take on how we interact and co-create with machines, no words or clicks required.

Since the advent of modern artificial intelligence in the 2020s, the primary way of interacting has been largely dominated by written language.

Compovision’s goal is to redefine how we communicate with artificial systems by creating a fully visual interaction where no words or prompt engineering is required.

By utilizing the visual world around, we can to compose and create entirely new worlds.

Concept and realization by Lionel Ringenbach, aka Ucodia.

Create it with your own hands

By pointing a set of camera to places or objects from your daily life, Compovision is able to instantly create new visual composition. This intelligent system is able to understand what you want to create simply through observation of the world you feed into it.

How it understands and creates

Under the hood, the intelligent system is able to discern the world you show using an AI vision model, which is able translate images into textual descriptions of what it sees. In turn it is able to compose the descriptions of all the images into a larger imaginary scene. Finally, it uses an image generation system to return an image of the scene which composes the world around you into a brand new world.

The first prototype (May 2024)

The initial prototype was ideated in October 2023, developed in May 2024 and first exhibited at the Vancouver AI Community Meetup in May 2024.

The core of the prototype is a sequential pipeline of 3 large language models (artificial intelligence LLM), each with a very specific tasks, and is executed in the following order:

  1. Llava, an “image-to-text” model
  2. Llama3, a “text-to-text” model
  3. StableDiffusion XL, a “text-to-image” model

In short, the pipeline acts as if it was an “images-to-text-to-text-to-image” model. This is different from an “images-to-image” model because the middle of the pipeline allows greater flexibility into how final the image is composed.

Workflow integration

Workflow integration

The pipeline was created and integrated entirely in TouchDesigner and ComfyUI. 3 webcams and a simple screen monitors were used to intake live video feed and display the resulting images. The entire software was executed offline by a Macbook Pro with a Apple M1 Max chip.