Generative AI (GenAI): Demystifying the Phenomenon

5 min readSep 29, 2023

Image generated using stable diffusion as a GenAI example

Ever heard of GenAI? It’s a term that’s been buzzing around in the tech world a lot lately. Simply put, it’s about computers making new stuff on their own — from text to images and more. In this article, I’m going to break down what GenAI is, how it grew over time, and how it connects to other tech ideas like Machine Learning, Deep Learning, and big models that can chat or write like humans.

To truly appreciate its evolution, we must first journey through the milestones of Artificial Intelligence (AI) and deep learning.

Artificial Intelligence: The Birth of Machine Thinking

AI, at its core, deals with the theory and methods to design machines that can think and act like humans. This field of computer science aims to recreate human-like intelligence — systems that can reason, learn, and function autonomously.

Machine Learning: The Catalyst

Emerging as a pivotal subfield of AI, Machine Learning (ML) is the magic that allows computers to learn from data without being explicitly programmed. ML programs train a model using input data to make predictions or decisions without human intervention.

In ML, there are two predominant models:

Supervised Learning: Using labeled data to make predictions. For instance, predicting the amount of tip in a restaurant based on the total bill.
Unsupervised Learning: It’s about discovery. Models look at raw data and identify patterns or groups, like clustering employees based on tenure and income.

Deep Learning: A Leap Forward

Deep learning, a further subset of ML, changed the game. Deep learning models, particularly artificial neural networks, can handle far more complex patterns than traditional ML. These models are designed with layers of interconnected nodes or neurons, much like the human brain.

In deep learning:

Artificial Neural Networks (ANNs): Inspired by human neural structures, ANNs consist of interconnected nodes that process information and learn from it.
Semi-Supervised Learning: It’s a blend. Neural networks trained on a mix of labeled and unlabeled data, ensuring better generalization for new, unseen data.

Historical Milestones in Deep Learning

Pre-2012: The concepts behind deep learning and neural networks have existed for decades. But computational power and data availability limited their potential.
2012 Breakthrough: In 2012, a deep learning model won the ImageNet Large Scale Visual Recognition Challenge, a pivotal moment. The model significantly improved image recognition accuracy, indicating deep learning’s vast potential.
Post-2012: The success in 2012 led to a surge in interest in deep learning. Tech giants like Google, Facebook, and Microsoft began investing heavily. Breakthroughs in natural language processing, computer vision, and speech recognition soon followed.

The Evolution of AI Programming to Generative Models

Tracing back the programming evolution:

Traditional Programming: Developers hard-coded rules for a specific task, say for distinguishing a cat based on attributes like type, legs, ears, and behavior.
Wave of Neural Networks (~ 2012 ): Instead of relying solely on human-coded rules, we provided networks with pictures of cats and dogs and trained the system to distinguish between the two.
Generative Wave: This wave brought in user-generated content capabilities across text, images, audio, and more. For instance, models like PaLM (Pathways Language Model) and LaMBDA (Language Model for Dialogue Applications) process vast data from various internet sources to build foundation language models. These can answer queries based on the extensive information they’ve been trained on.

Generative AI: The Definition

Generative AI, a subset of the deep learning domain, has been making waves in the tech world, producing content from text and images to audio and synthetic data. When provided with a prompt, the AI uses this model to predict an expected response, generating new content. Gen AI grasps the underlying structure of its training data, subsequently generating new samples resembling the data it’s been exposed to. For example, a generative language model can generate novel combinations of text resembling natural language.

These generative models have different facets:

Text-to-Text Models: Takes a natural language input and produces a text output, e.g., translating from one language to another.
Text-to-Image Models: Trained on images, each with a text description, they can generate images based on textual prompts.
Text-to-Video and Text-to-3D: These models can generate dynamic visual content based on textual inputs.

Generative AI: The New Frontier

Generative AI fits snugly in this continuum. While deep learning models can classify (discriminative models) and predict data, generative models can create new data instances. They don’t just recognize a dog in an image; they can generate a completely new dog image.

The evolution has been as follows:

Discriminative Models: They learn the difference between different data points. For instance, identifying whether an email is spam or not.
Generative Models: They produce new data points, like creating a new song or generating a new text.

In the AI ecosystem, Generative AI stands out when the output isn’t just a label or a probability but something more tangible — like text, audio, or images.

Foundation Models and Its Post-2012 Ascendancy

Foundation Models Era: As we moved into the late 2010s and early 2020s, the concept of ‘foundation models’ emerged. These are large-scale models pre-trained on vast amounts of data that serve as a foundational layer. Instead of building models from scratch, researchers and developers can fine-tune these foundation models for specific tasks, making them more efficient and accurate.

Examples of foundation models include:

GPT (Generative Pre-trained Transformer) series by OpenAI, which revolutionized natural language processing.
BERT (Bidirectional Encoder Representations from Transformers) by Google, which set new standards for NLP tasks.
DALL·E by OpenAI, a variant of the GPT-3 model, is designed to generate images from textual descriptions, bridging the gap between NLP and computer vision.

These foundation models have several advantages:

Efficiency: Rather than training a model from scratch, which requires massive computational resources, one can fine-tune a foundation model, saving time and energy.
Versatility: A single foundation model can be adapted to numerous tasks, from text generation to image recognition and beyond.
Improved Performance: Due to the vast amounts of data they’re trained on, foundation models often outperform task-specific models.

However, they also bring challenges:

Bias and Fairness: Being trained on vast datasets means they may internalize and perpetuate biases present in those datasets.
Environmental Concerns: The training of such large-scale models requires significant computational power, leading to concerns about energy consumption.
Over-reliance: As foundation models become commonplace, there’s a risk of stifling innovation, with researchers choosing to fine-tune existing models rather than creating new architectures.

Generative AI’s Role in the New Landscape

With the emergence of foundation models, Generative AI has found new avenues of application. These models are now not just about generating data but also understanding, modifying, and enhancing it in nuanced ways that were previously unthinkable.

For instance, foundation models in the Generative AI space can:

Create realistic virtual environments or characters for video games.
Generate artwork or music, collaborating with human artists to push the boundaries of creativity.
Understand and generate natural language at a sophisticated level, enabling more advanced conversational AI systems.