AI Photobooth Process

AI Photobooth is like a digital mirror, reflecting how Artificial Intelligence models “see” us.

Tech Stack

Node + Express backend
FaceDetection API with TinyFaceDetector model
OpenAI API for captioning and image generation
HTML + CSS + JavaScript for frontend
Dokku deployment on a DigitalOcean droplet

How it Works

Once a face is detected, a pulsing button invites the user to press it.

The capture image is displayed to the screen and also sent to a Large Language Model to be “translated” into a caption.

This caption is used as a prompt to an image generator, returning the AI representation of the person captured.

What’s Going On Behind the Scenes?

When the page is loaded and webcam permissions are granted, the face-detection API is used to detect if a face is in the frame. A bounding box is drawn around the face.

This is hidden from the user, but pressing “d” on the keyboard starts debug mode where you can see what the face detection sees.

Once a face is detected, the “AI Capture” button becomes enabled. When a user clicks the button, the image from the face detector is captured using the bounding box coordinates and stored as a base64 string which includes all the pixel data of the captured portion.

This base64 string is sent to the server with a POST request. The Node server processes this request, asking GPT-4o to describe the contents of the image.

Once the caption is returned from the OpenAI API, it is displayed on the screen and returned to the frontend JavaScript. Here it is first displayed on the screen, then sent back to the server via another POST request to get an image. The server sends the caption as a text prompt to the DALL·E 3 API.

Once the image has been generated, it is returned back to the frontend JavaScript and displayed for the user. After a few seconds a reset button appears, allowing the cycle to begin again.

But Why?

This project arose as I was experimenting with LLMs as a creative collaborator through some critical making experiments. I was curious how AI “sees” us, and whether this would reveal any biases or limitations of the models.

Most often the models will return an “attractive” likeness, often reduced to some standard stereotypes (e.g., male = facial hair). The models are reluctant to make assumptions about gender, race/ethnicity, and age, but can be coaxed with some prompting. This suggests that the model is being limited by concerns of the creators, and not necessarily a limitation of the model itself.

What is most interesting is when the model is unsure or struggles in some way. For example if the caption expresses the inability to determine a person’s race, the image generator will respond with a blank-faced person.

AI Photobooth Process

Tech Stack

How it Works

What’s Going On Behind the Scenes?

But Why?

Links