The first blog post, found here. Was focusing on an introduction to the history of AI, the different types of AI and how they work, a breakdown of the main AI providors, a list of some of the more useful resources I’ve ran across, and a few more things.
This blog post is going to be focused on AI image generation, how it works, main companies that provide the service, and other companies that provide the service but are based off the service provided by other companies.
First let’s go over some of the history related to image generation. Recently the idea behind AI generated art has become very widespread and very popular, but the technology for this existed long before it became a popular mainstream thing.
The history of AI image generation and deepfakes dates back to the early 2010s, when researchers began experimenting with deep learning techniques to generate realistic images. The following is a brief timeline of some of the key developments in this field:
- 2012: Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton publish a paper describing a deep learning approach called convolutional neural networks (CNNs) that can classify images with high accuracy. This approach forms the basis for many subsequent advances in image generation.
- 2014: Ian Goodfellow and colleagues publish a paper on generative adversarial networks (GANs), a type of deep learning model that can generate realistic images by training two neural networks to compete against each other.
- 2015: Researchers develop DeepDream, a visualization tool that uses CNNs to generate trippy, surreal images by enhancing patterns found in existing images.
- 2016: Google releases a paper on Neural Style Transfer, a technique for transferring the style of one image to another using CNNs.
- 2017: A user on Reddit creates a pornographic video featuring the faces of celebrities superimposed onto the bodies of adult performers, sparking public concern about the potential for AI-generated “deepfakes” to be used for non-consensual purposes.
- 2018: Deepfakes gain widespread attention after being used to create realistic videos featuring politicians and celebrities saying or doing things they never actually did. Researchers begin developing techniques to detect deepfakes and to create “fake-proof” media.
- 2019: OpenAI releases a paper on GPT-2, a large-scale language model that can generate coherent and realistic text. The company initially withholds the release of the full model over concerns about its potential misuse for generating fake news and propaganda.
- 2020: Researchers develop techniques for creating “styleGAN” images, which use GANs to generate high-quality images that can be controlled by adjusting various features such as facial expression or hair style. These images are used in applications such as fashion design and virtual try-ons.
Then recently as January 2023 rolled around, there was a huge explosion in the field of AI image generation that took the world (and is still currently) by storm.
There are a lot of AI image generation out there, but let’s touch base on a bunch of random ones before I Dig into some of my favorites (and the most popular ones).
Here is an overall list of random ones to give you an idea of some of what is available.
- DALL-E 2: Developed by OpenAI, DALL-E 2 is a powerful image generation model that can generate high-quality images from textual descriptions. It was designed to be capable of creating a wide variety of images, from photorealistic to surreal and abstract.
- StyleGAN: StyleGAN is a popular generative adversarial network (GAN) model for creating realistic images that can be controlled by adjusting various features such as hair style, facial expression, and more. It has been used in applications such as virtual try-ons and fashion design.
- DeepArt.io: DeepArt.io is a web-based service that allows users to transform their photos into artistic styles using neural style transfer techniques. Users can choose from a variety of styles, including famous art styles and custom styles uploaded by other users.
- Artbreeder: Artbreeder is a web-based service that allows users to create and evolve digital art using GANs. Users can generate new images by combining existing ones and evolving them using genetic algorithms.
- RunwayML: RunwayML is a platform that allows users to experiment with and create various machine learning models, including image generation models. It offers a variety of pre-trained models as well as tools for training and deploying custom models.
- NVIDIA GauGAN: NVIDIA GauGAN is a real-time image generation tool that allows users to create photorealistic landscapes and scenes by drawing simple sketches. It uses GANs to generate detailed and realistic images based on user input.
- Artomatix: Artomatix is an AI-based tool for creating and modifying 3D models and textures. It uses a combination of GANs and other machine learning techniques to generate realistic textures and materials for use in video games, film, and other applications.
- DeepDream: DeepDream is a visualization tool developed by Google that uses CNNs to generate trippy, surreal images by enhancing patterns found in existing images. It has been used in art and creative projects as well as for scientific visualization and analysis.
- PaintsChainer: PaintsChainer is an online service that allows users to colorize their black and white line art using neural networks. It uses deep learning techniques to generate color schemes based on the input image and user preferences.
- Pikazo: Pikazo is a mobile app that allows users to turn their photos into digital art using neural style transfer techniques. It offers a variety of styles and filters that can be customized and combined to create unique and creative images.
- DeepFaceLab: DeepFaceLab is a free and open-source software tool for creating deepfakes, or realistic face swaps. It uses GANs and other machine learning techniques to generate high-quality face swaps that can be used in film, video, and other applications.
- AI Painter: AI Painter is a web-based service that allows users to turn their photos into digital paintings using neural networks. It offers a variety of painting styles and filters that can be customized and applied to the input image.
- Prisma: Prisma is a mobile app that uses neural networks to apply artistic styles to photos in real time. It offers a variety of styles inspired by famous artists such as Van Gogh, Picasso, and more, as well as custom filters created by other users.
- Artisto: Artisto is a mobile app that allows users to apply artistic styles to videos in real time using neural networks. It offers a variety of styles and filters that can be customized and applied to the input video.
- IBM Watson Visual Recognition: IBM Watson Visual Recognition is a cloud-based service that allows users to train custom image recognition models using deep learning techniques. It can be used in applications such as object detection, image classification, and more.
- DeepFaceDrawing: DeepFaceDrawing is a web-based service that allows users to create face sketches using neural networks. Users can draw a simple face sketch and the system generates a photorealistic image based on the input.
- AI Gahaku: AI Gahaku is a web-based service that allows users to turn their photos into classical paintings using neural networks. It offers a variety of painting styles inspired by famous artists such as Leonardo da Vinci, Rembrandt, and more.
Now I want to centralize our focus on the main ones that are currently on the market (and the ones that I like the most). These are also the ones that are gaining the most popularity. Some are repeats of the above but I want to visit the relevant ones in a lot more detail.
Midjourney: Midjourney is an AI program developed by Midjourney, Inc., capable of generating images based on natural language prompts. It is similar in functionality to OpenAI’s DALL-E and Stable Diffusion. The tool provides a creative way to generate diverse images from text descriptions. This one specifically has been at the forefront of recent… “popularity”. Because of the rise of deep fakes (specifically the viral photos of trump getting arrested) they temporarily disabled free generations, as well as blocked the word “arrested”. Overall still one of the more popular mainstream systems.
Stable Diffusion: Stable Diffusion is an open-source machine learning model used to generate images from text, modify images based on text, and enhance low-resolution or low-detail images. It has been trained on billions of images and produces results comparable to DALL-E 2 and Midjourney. The technology is known for its versatility in image generation. One of the most popular among reddit users and other types of users overall, wasn’t originally as mainstream as midjourney but has a rabid fanbase (rightfully so). They’ve cranked out some beautiful images.
Gabby (Gab Social): Gabby is an image-generating AI bot used by Gab Social. Users can interact with Gabby by sending direct messages to the @AI account on Gab, providing text prompts for image generation. The quality and accuracy of the generated images depend on the user’s prompts, and Gabby offers a user-friendly way to create custom images. Not anywhere near as.. advanced as the others but it’s free, it’s iterative and it’s integrated directly into the chat. It’s cranked out some decent/nice images for me. Not as high quality as some of the others but still fun to play around with. They continue to improve it over time, so no telling what the future will hold. You can register for an account then visit that link and just message that account and it’ll iterate the images for you.
DALL-E 2: DALL-E 2 is an AI model developed by OpenAI, known for generating high-quality and creative images from natural language text prompts. It is an iteration of the original DALL-E model and is capable of combining various concepts and styles to produce photorealistic images. DALL-E 2 has gained recognition for its ability to create original and visually engaging images from a wide range of user-provided text prompts. The forefront of image creation technology. This API powers A LOT of other services and systems. You get a lot for free, but you can pay for more. Amazing image creation and they improve it iteratively all the time.
Bing Chat has integrated image generation into it’s platform but it is based off of an “Advanced version of Dale-2” according to the release notes. It works really well and it’s iterative, which is a bonus. This is one of my favorites so far. It uses Dale-2 but an advanced version of it. They have an entire image creator, but also has iteration that works amazingly in the chat. Still has a few bugs to work out but it’s really well integrated right into the chat bot.
Playground AI – This is my favorite one on the market. Absolutely AMAZING images. Several models to choose from, and generates multiple images at a time with a lot of flexibility on controls and settings. You can pay to get access to a LOT more and exclusive models. So far, besides Bing Chat and Dale-2, this has been one of my favorites.
There are some… topics that have been in the news a lot. Here are a few of them, and my thoughts surrounding them.
Copyright – People are on both fences in regards to copyright law. All the image generation services say you can use their created images for commercial or personal and you own everything they output. Also, most of them are trained off of real images from people online. So there have been a few lawsuits from various situations where images have been close to something already existing. There have already been legal battles about copyright. Overall it’s a difficult topic but eventually they’ll have copyright updates to help handle the influx of AI generated images.
Job loss – Deep fakes. Deep faked videos and images have had a huge rise and has concerned a lot of people. There have also been scams where someone will call you on facetime with a deepfake video looking and sounding just like someone you know. Eventually legislation will limit how the graphics can be used, but I think that’s a naturally part of the process as we continue growing into the future.
Within the series there are currently 3 parts.
Part 1 can be found here.
Part 3 can be found here.