The AI Revolution: How We Got Here & What We Need to Know

In 1950, Isaac Asimov published a series of 9 short stories under the collective title I, Robot. The work has been held up as some of the first and most influential in our understanding and guiding principles of designing intelligent systems. Asimov envisioned a world with advanced robots—what we would now more abstractly call “artificial general intelligence” (AGI). It’s a fascinating exploration of human ethics and moral conflict set against three core rules, now generally known as “Asimov’s Three Rules of Robotics”:

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  2. A robot must obey the orders given to it by human beings, except where such orders would conflict with the First Law.
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law. 

The stories bear no resemblance to the Hollywood flop that co-opted their name. They center on individual conflicts between direct orders and those governing laws, and range from very small, individual interactions to cataclysmic global conflict, and reveal some very serious challenges that must be faced in a world adopting AGI.

Contrary to alarmist publications that claim AI is alive, we are not yet in that world. But the recent and extremely rapid development of artificial intelligence across multiple verticals means that we must be prepared to face those challenges now, and that our solutions will need to exceed the limits of Asimov’s thinking. These concerns have been echoed and amplified as fear through pop culture representations of technology, even where those representations bear little-to-no resemblance to the real world, and without concrete examples of safe computing to back up our global adoption of technology, they’ve allowed advanced computing to become a sort of societal bogeyman. Movie franchises like The Matrix and The Terminator spring to mind, but there are plenty more.

Artificial general intelligence is generally considered hypothetical. It’s defined as an independent computing system capable of performing any intellectual task a human can do. It would be able to decide on its own to look at data, and then decide independently what to do, if anything, about that data. AGI is considered to be adaptable in its behaviors. AI, or more specifically “generative AI”, as we know it today, is a subset of AGI that can produce new outputs from existing patterns or inputs. Generative AI can mimic human interaction, but it is not sentient—it cannot become “self-aware” and launch a robot apocalypse.

Generative AI is built on years of mathematical and statistical modeling, tracing its roots to Markov models, which conclude that future states are the explicit result of current states. Markov models rely on input parameters (current states) to generate pseudo-random outputs (future states).

If the input & output states are fully observable—nothing is left to chance—we get outputs in the form of Markov chains. These chains are said to be “deterministic”, as in: 1 + 2 = 3. There’s never a time when 1 plus 2 equals another value. It’s always just 3. A simplified example of a deterministic Markov chain is the current amount of money in your bank account: you know what you’ve deposited, and you have an assurance from the bank of a given interest rate. Between the two, you can calculate exactly how much money will be in your account at any given time.

From an IT perspective, a more relevant example might be a Fibonacci sequence. If you’re not familiar with the concept, a lot of us old IT guys used to rely on the pseudo-random nature of Fibonacci sequences in early random-number generators. They were never really random, though:  Fibonacci states that each number is the sum of the two preceding numbers. In a pure sequence, the first two numbers are 0 and 1, but the neat thing about Fibonacci, and what makes it pseudo-random, is that you can start with any two input numbers.

So while the baseline is 0,1,1,2,3,5,8,13,21,34,…

The values could just as easily be 4,7,11,18,29,47,76,123,199,322,…

When I say this is not new, I mean it’s really not new: Fibonacci lived from 1170 to 1240 AD.

In contrast to a deterministic sequence, a probabilistic sequence relies on…probability. We still know the current states, but we’re relying on either relative frequency, equal probabilities, or suggestive guessing to determine the future state.

A simple example of this would be rolling a die. There are x sides, and each has equal probability of landing face-up, so our probability of a next roll value is p(1) = 1/x. This works whether you’re talking about a 20-sided die or a coin toss. It’s scalable and linear and pretty easy to understand.

Relative frequency is a bit more interesting, and benefits from having more “current state” data, or parameters. This is where we get into the language modeling side of things, and in fact an alphabetical parameter set is a perfect example.

The following table shows an example subset of letter-value parameters with the probability that the next letter in any given word will be the corresponding output column. So, for instance, when the letter “B” is encountered, there’s a far greater chance that the next letter will be A, E, or I than any other consonant on the chart. This model defines a “transitional probability index”, and while some transitions are very infrequent (0.0x), others approach a probability of 1. Think, for instance, of the letter “Q”. In English, it’s almost always followed by a very predictable next letter. 

 
 

A 

B 

C 

D 

E 

F 

G 

H 

I 

A 

0.10 

0.2 

0.3 

0.2 

0.14 

0.2 

0.2 

0.07 

0.2 

B 

0.25 

0.10 

0.05 

0.12 

0.4 

0.06 

0.07 

0.08 

0.3 

C 

0.3 

0.05 

0.2 

0.08 

0.3 

0.07 

0.08 

0.09 

0.3 

D 

0.13 

0.12 

0.11 

0.10 

0.11 

0.08 

0.09 

0.10 

0.11 

E 

0.14 

0.4 

0.3 

0.4 

0.2 

0.23 

0.15 

0.08 

0.12 

While the numbers in the chart above were randomly generated (sorry!), the real models were fed heaps and heaps of books, articles, and websites as training models to learn the very structure of our language.

We’ve relied on this modeling for the past decade or so to handle auto-correct on our phones. It’s really that simple. Predictive outputs based on Markov-modeling your inputs.

And because we can define any input as a parameter, be it a bit, a byte, a word, a phrase, or whatever, and then because we can perform iterative or hierarchical runs, auto-correct can become auto-complete can become auto-fill. Suddenly our phones seem really smart. Suddenly voice-enabled assistants become possible, and we feel like wizards running around the house re-ordering paper towels and telling the TV what we want to watch.

But again: not new. Just new to us, the consumers. It turns out all that probability forecasting has been helping us pick the right jacket or umbrella for decades. Yep: this is how weather forecasting works. Take a big enough set of parameters (temperature precipitation, humidity, wind) over a big enough cross-section of space and index it against a big enough amount of time, and you can reasonably predict what tomorrow and the next day will be like. In Ye Olden Dayes of computing, this is why weather models had to be run on supercomputers: the dataset and calculations were just too big to run on standard compute.

The very real and neat thing about using these models to predict weather is that, almost necessarily, the forecasts will improve with time, if for no other reason than that the sampling set gets bigger with time. I’ve noticed in my lifetime that forecasters have gone from saying that anything beyond day 3 is pure conjecture to confidently laying out a 2+ week forecast. Accuweather.com now shows a day-by-day 6-week forecast. They’ve had almost 50 years of data collected in that time, adding almost half again the amount of data they already had, and layering in new datasets to improve accuracy of storm tracking. Weather data may be one of the most complete datasets on the planet, with ocean surface-temperature logs going back hundreds of years.

And while we all love to hate on the weatherman, there’s no pop culture bogeyman. There’s no sub-plot of Neo & Morpheus trying to decide which black leather coat to wear. Cloudy With a Chance of Meatballs? The movies, maybe—certainly not the book.

So we had phenomenal capabilities to assess data YEARS (DECADES!) ago, but none of it rose to what would enable the tools we’re working with in 2024. What changed? Attention.

Join us for part 2, which will dive into how attention revolutionized the revolution.

 


 

If your organization wants to improve productivity by using Microsoft Copilot, Synergy Technical can help. Our Microsoft Copilot for Microsoft 365 Readiness Assessment will validate your organization’s readiness for Copilot and provide recommendations for configuration changes prior to implementation. We’ll help you make sure that your team’s data is safe, secure, and ready for your Copilot deployment.

 

Comments