Generative AI 12.09.2023

A Brief History of Generative AI – Which Milestones Made Today’s Tech Possible and What Do They Have In Common?

Arttu Närhi

Arttu Närhi

From the birth of speech recognition systems to the dawn of conversational AI, the journey of generative AI is nothing short of a technological odyssey. Let's walk down memory lane to revisit the milestones that have shaped today's AI landscape.

Generative AI has exploded onto the scene over the last few years. This year alone, the pace of change has been exceptionally fast, and players all across the market are looking to benefit from the new technology. Most of the newest applications and announcements have something to do with content generation: whether it is producing written content or images, these have been the best way for all users to get a grip on the possibilities.

The development has been rapid, and we can’t blame anyone for losing track of the major milestones. So, in chronological order, we present the latest developments that have captured public interest. Looking at what has happened so far, we might get a clue about what the future will hold…

Carnegie Mellon University: Xuedong Huang Develops Sphinx-II Speech Recognition System – 1992

Photo: Wikimedia Commons (CC BY-SA 4.0)

Speech recognition has been around since at least the 1970s. Depending on who you ask, whether or not all of it falls under artificial intelligence is up for some debate. But there is a seminal moment in its development history where the technology exceeded human capability and could be considered a forebearer for modern speech recognition software used by AI assistants and other computers.

This development was known as Sphinx-II. It was developed by Xuedong David Huang at Carnegie Mellon University (CMU). Sphinx-II was the first application capable of speaker-independent, large vocabulary, continuous speech recognition, with a vocabulary larger than the average human being. Huang went on to join Microsoft in 1993 to develop language processing and artificial intelligence technologies. Currently, he is the CTO of Zoom Video Communications. Information on Sphinx-II is scarce, but an archived version of the original research publication can be downloaded via this link. Based on the original research, an open-source project called PocketSphinx is still in active development.

IBM: Deep Blue Beats Chess Champion Garry Kasparov – 1997

Photo: Jim Gardner/Flickr

Deep Blue was an IBM supercomputer created for one purpose only: to beat the world’s best chess players at their game. Deep Blue began its development at CMU with students Feng-hsiung Hsu and Murray Campbell. The two were later hired by IBM and joined the team of engineers who would eventually create the computer that achieved an incredible milestone.

The ‘97 match was not the first time Kasparov engaged with Deep Blue: the first match took place in 1996, which Kasparov won by switching his strategy in the middle of the game – a move chess grandmasters could anticipate, but proved impossible for that version of Deep Blue to overcome. Kasparov had also competed and won against Deep Blue’s predecessor, Deep Thought, in 1989, making the saga an interesting example of how much AI progressed in a short period.

Deep Blue vs. Kasparov was widely publicized then and became a benchmark for demonstrating AI capabilities with games. Processing power was limited back then, and the world had yet to learn about the advantages of working with big data sets. What Deep Mind did teach the team, though, was that human and machine problem-solving could complement each other greatly. The success led IBM to pursue new avenues of AI research, paving the way for many future innovations.

ActiveBuddy: SmarterChild, an Instant Messaging Bot, is Launched – 2001

Photo: Arthit Suriyawongkul/Flickr

SmarterChild was among the first user applications that did rudimentary tasks similar to what Alexa and Siri do today. This bot was available in AOL Instant Messenger and MSN Messenger (later renamed Windows Live Messenger); the way it worked was that users added the bot as a contact and used the program interface to chat with it.

The bot couldn’t browse the web, but it had licensed databases from services like IMDB, the Weather Channel, the Yellow Pages, and Sony, enabling it to answer plenty of questions a user asked. SmarterChild also had a personality: it would demand an apology if someone cursed at it and would give the user a silent treatment until they said they were sorry.

Activebuddy (previously known as Colloquis and Conversagent) was acquired by Microsoft in 2006. The company’s natural language-processing technology became an important part of Microsoft’s efforts in the field. While it might be disingenuous to draw a direct connection from SmarterChild to products like Siri, it’s clear that users whose first experiences with chatbots started with SmarterChild would find parallels with modern AI assistants. Shawn Carolan of Menlo Ventures pointed out this similarity himself in a 2011 interview. 

Apple: Siri is Released with iPhone 4S – 2011

Photo: Wikimedia Commons (CC BY-SA 4.0)

Probably among Apple’s best-recognized products, Siri became an instant hit when it was made available on the iPhone and later on almost all Apple devices. Apple acquired the original company, Siri Inc., for an undisclosed amount in 2010. The venture was a spinoff out of SRI International, which, like Sphinx-II, was supported by DARPA, the US Department of Defense’s R&D wing.

Siri’s success as an AI assistant is undeniable, making the concept familiar to the world. Iconic features like the phrase “Hey Siri!”, the animated orb, and displaying the inputs in chat format are not only a testament to Apple’s design integrity but also features that have influenced AI applications that have come since. Siri also pressured Amazon, Microsoft, and Google to launch their assistants, Alexa, Cortana, and Google Assistant. 

IBM: First Commercial Application of Watson – 2013

Photo: Atomic Taco/Flickr

IBM’s victory over Kasparov fueled their supercomputer development for years to come. The next public lashing of a champion came in 2011: IBM Watson, developed to beat players of the gameshow Jeopardy!, beat two champions on the show, winning the $ 1 million jackpot (which IBM donated to charity). 

However, unlike Deep Blue, Watson was not retired after the monumental victory – its development was pushed further. What made Watson a great Jeopardy! player was its unprecedented ability to understand questions and produce answers humans could understand. This made it perfect for processing large data sets, including unstructured ones, and output information useful to researchers. In 2013, IBM started making Watson-based applications; the first was created for Memorial Sloan Kettering Cancer Center to help decide on treatment for lung cancer patients. 

Microsoft: Turing Natural Language Generation (T-NLG) is Introduced – 2020

Photo: Microsoft

By 2020, OpenAI had released their GPT-2 language learning model, and NVIDIA was working hard on Megatron-LM. While GPT-2 had an impressive 1.5 billion parameters, and Megatron boasted 8.3 billion itself, T-NLG blew both out of the water with 17 billion parameters, making it the largest language model at the time. 

Such NLP models were perhaps the most important breakthrough modern generative AI applications needed to become reality. The ability for machines to understand and operate based on open-ended text written by humans broke many limitations for users everywhere. People no longer need to learn the intricacies of individual databases or software applications. New programs would understand what their users would want and could generate innovative outputs on highly unstructured input. 

OpenAI: DALL·E is Released – 2021

Photo: OpenAI

The new decade kicked off a torrent of new generative AI applications. At the time of writing, we are still in the middle of it – new applications are released every day, and we probably cannot even imagine what the future of generative AI will be.

Among the first practical demos of the newly unlocked capabilities was OpenAI’s DALL·E. This was the first text-to-image generator that went viral on Reddit and Twitter. Based on GPT-3, the first DALL·E convincingly demonstrated the potential of their technology to the average user. Since the first release, DALL·E has been updated to its second version and can now produce frighteningly convincing images along with other image generators. So much so that OpenAI imposes restrictions on what can be generated.

Other text-to-image generators have followed quickly, with Midjourney launching in Summer 2022 and the initial release of Stable Diffusion becoming available for researchers in the same Fall. Just two years later, we have plenty of choice, with companies like Adobe and Canva incorporating generative tools in their products.

OpenAI: ChatGPT free preview is launched – 2022, December

Photo: Wikimedia Commons (Public Domain)

If there is one moment that will go down in history as the day the world changed, the release of ChatGPT is sure to be up there. The app's overnight success took its creators at OpenAI by surprise, too. Social media was quickly filled with people demonstrating the power of their text prompts and the creative results the app generated. It did not take long for the first book written by ChatGPT to be published. 

ChatGPT has been noted to have kicked off a generative AI arms race. Microsoft didn’t take long to incorporate the tool in Bing Search in early 2023. Moreover, the release famously caused a “Code Red” alert at Google as executives feared the first serious challenge to their search tool in a long time.

It has not even been a year since this seminal moment. Experts are still working to keep up with the rapid pace of change that OpenAI unleashed on the world. Among the great benefits, there have been calls for better regulation and investigating the ethics of such technologies from their creators and users. What is clear, however, is that generative AI is here to stay. 

NVIDIA and Oracle: Extensive Cloud Services for Enterprise Generative AI are Announced – 2023

Photo: Wikimedia Commons (CC BY-SA 4.0)

Enterprise and B2B applications of new technology famously lag behind consumer apps every time. As consumers and researchers have enjoyed the fun and productivity of generative AI tools for years, only now are the first applications becoming available for enterprises and large companies.

This summer, NVIDIA and Oracle announced their suite of products and services for corporations and other commercial organizations to develop their generative AI applications. The potential is there, and without a doubt, we will begin to see new generative AI solutions transform global business and production

So what’s the common denominator? And what does the future hold?

Behind a great majority of these success stories have been partnerships between growing innovators and large incumbents with decades of experience. Such relationships have enabled a revolution in AI technology, not forgetting the important input of public spending on critical projects.

We at Combient Foundry strongly believe the future is created together. And with the greatest technology trend to hit the market in years, we want to ensure no company with the best solutions will miss out on the spoils. It’s time to usher in the era of industrial generative AI.

To do this, we will soon launch new opportunities for startups to partner with the Foundry companies in areas focused on creating new Generative AI solutions and technologies. Watch this space for more information!

ChatGPT did not write this article, but it gave me notes on my draft and a good idea for the ingress. Cover image generated by Canva.