Zeta Comics: Blending AI & Art in Digital Stories
Originally published on GreenZeta.com February 28, 2024
In the ever-evolving landscape of digital art, the intersection of creativity and technology is opening up avenues I never before thought possible. Enter the Zeta Comic Generator, a project that marries human drawn cartoons with the growing capabilities of artificial intelligence. The project is not just an art and tech demo. It’s also a personal journey that weaves together a love for drawing, programming, and a curiosity for the potential of AI. This article delves into the origin of the Zeta Comic Generator exploring its conception, the software mechanics behind it, and the techniques that drive its fusion of hand-drawn art and AI-generated content.
The Idea
My engagement with cartooning can best be described as sporadic. Central to my experimentation as an artist is Alpha Zeta, a green alien who is a recurrent figure in my ventures. Conceived during my high school years, Alpha Zeta became a distinctive presence in my work as a software engineer. I always wanted to create an episodic series featuring a group of aliens, but always lacked a coherent narrative. I dabbled in it many times but never found a good story to pursue.
When I began doing talks for a local developer group, I started adding Alpha Zeta to my slides. It served as a visual embellishment, something to jazz up my presentation. I continued the practice in my articles and projects. It was fun drawing these one-off character poses with no context, so I kept doing it. Soon I had a small library of my own character art.
The advent of AI in art, particularly through GPT and Dall-E, marked a turning point. Initially, my gateway drug into AI was Dall-E 2 and Midjourney. Though Midjourney produced better results, I gravitated to Dall-E because it had a free tier. It led me to create background images that I could place my character poses on, like an animation cell. Dall-E 3 offered a huge upgrade in output quality. With ChatGPT integration, now I have the ability to create reference images on demand. I was sold on the subscription price.
Going back to the desire for an episodic series, I thought it would be great if an AI could just write one for me. It was a pipe dream. I had no illusions of what GPT is capable of. Thoughts like that, I think, are at the root of most people’s disappointment with AI. But I thought it might be just good enough for a really short story. GPT can output JSON, so hooking it into a JavaScript app is trivial. After some experimentation in ChatGPT, I was confident the concept would work.
The Technique
The Zeta Comic Generator manages a series of sequential calls to both GPT and Dall-E. The models work together in scriptwriting, background generation, and the integration of hand-drawn character art.
AI Script
Each comic strip begins with a premise, a seed for a story told through AI’s “imagination”. Everyone visiting the Comic Generator can enter their own premise on the site’s “Create” page. That premise is then inserted into a prompt for the AI. GPT, playing the role of a cartoonist and humorist, uses the premise to weave a narrative for a three-panel comic strip featuring Alpha Zeta. The script, output as a JSON object, outlines the scene and dialogue for each panel. The stage is set for the visual elements to come alive.
AI Background
With the script as the foundation, GPT’s next role is to conjure the visual settings for each panel. It crafts prompts for Dall-E, meticulously describing each backdrop while ensuring the focal point, Alpha Zeta, remains absent, reserved for the final touch of hand-drawn art. These prompts are sent off to Dall-E to generate background images for each panel of the comic.
Hand-Drawn Character
The comic generator offers a growing number of Alpha Zeta actions. Each one a hand drawn image, bringing consistency and emotion to the comic. Each action is represented as a word, e.g. standing, sitting, joyous, terrified. The script is sent back to GPT where it is asked to choose which of the action words best describes what the character is doing in each panel. Its choice determines the image that appears in the panel.
The Process
The Comic Generator has a PHP back-end and JavaScript front-end. The back-end handles all communication with the OpenAI API, the front-end manages the sequential calls, responses, and ultimately composes the final result. For those interested in the inner workings of this project, the complete source code is available on GitHub.
The Backend
Given that OpenAI publishes an npm library for interaction with their API, PHP is not the ideal solution. However, my server runs a LAMP stack so, for me, it was the path of least resistance. Fortunately, their API has a REST interface making it accessible from PHP through curl. Using curl isn’t difficult and OpenAI’s documentation provides examples of each call. Since the only thing that changes is the prompt, each GPT call uses the same code. Only minor modifications are needed for Dall-E.
Example PHP:
$url = "https://api.openai.com/v1/chat/completions";
$prompt = {GPT PROMPT HERE}
$ch = curl_init();
$headers = array(
'Authorization: Bearer ' . {OPENAI API KEY HERE},
'Content-Type: application/json',
);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_HEADER, 0);
$body = '{
"model": "'.OAI_MODEL.'",
"response_format": { "type": "json_object" },
"messages": [
{
"role": "user",
"content": "'.$prompt.'"
}
]
}';
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_POSTFIELDS,$body);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
Example Prompt:
You are a cartoonist and humorist.
Write the script for a three-panel comic strip.
In the comic strip, our main character, a short green humanoid alien named Alpha Zeta, engages in the following premise: {Premise}
Include a detailed scene description and words spoken by the main character.
Write your script in the form of a json object. The json object has the following properties: `title` and `panels`.
The following is a description of each property value:
`title`: The title of the comic strip. Limit to 50 letters.
`panels` is an array of objects with the following properties: `scene` and `dialog`
`scene`: A description of the panel scene including all characters.
`dialog`: Words spoken by Alpha Zeta. He is the only character that speaks so there is no need to label with a name. This can be an empty string if the character is not speaking.
All prompts used in the Comic Generator are available to read in the About page.
Example Output:
{
"title": "Alpha Zeta and the AI Artisan",
"panels": [
{
"scene": "Panel 1 shows Alpha Zeta sitting at a sleek, futuristic computer console...",
"dialog": "So this AI can draw comics? Let's test its sense of humor with an intro piece!",
},
...
]
}
The JSON output, from the model, is sent as a response from the PHP page. Originally, this required a lot of extra work to extract and validate the JSON part of GPT’s output. Fortunately, the introduction of the `response_format` parameter now forces GPT to respond only with valid JSON. A makeshift REST api on my end allows the Comic Generator to select each prompt by sending a fetch request to a different url endpoint. e.g. `/script`, `/backgrounds` & `/actions`.
The Frontend
The JavaScript front-end manages the entire process, starting with the user’s premise and managing each back-end call in sequence. Interaction with the PHP back-end is done with simple fetch requests, gathering the necessary data. Here’s a glimpse of what the final data object looks like:
{
"title": "Alpha Zeta and the AI Artisan",
"panels": [
{
"scene": "Panel 1 shows Alpha Zeta sitting at a sleek, futuristic computer console...",
"dialog": "So this AI can draw comics? Let's test its sense of humor with an intro piece!",
"background": "A futuristic, minimalist room with sleek surfaces and ambient lighting...",
"background_url": "...",
"action": "sitting"
},
...
]
}
Once the data object is complete, the final comic is assembled. To keep things simple, each comic has exactly three panels and each panel is a perfect square. The dialog balloons are rendered in a <canvas> element and output as an image. The three images: background, character, dialog are stacked on top of each other to form each complete panel.
As AI tools evolve, so will the Zeta Comic Generator. It’s already received improvements in quality through upgrades to GPT and Dall-E. My hope is that one day it will be up to the task of creating episodic storylines with rich continuity. Improvements in the comics won’t be limited to technology, Alpha Zeta’s abilities will also continue to grow as I add new action artwork to the list.
Try out the Zeta Comic Generator and see your story ideas unfold. As you explore this tool, you’re not just creating comics; you’re part of an exciting journey at the forefront of digital creativity. The future of AI and art is bright and full of potential. Zeta Comic Generator just scratches the surface of what’s possible. Let your imagination run wild and be a part of crafting tomorrow’s art landscape!