we were trying to teach an agent using a robotic hand to pick things up. Just a simple reach and grasp problem like I'm doing with this glass of beer when the glass got far enough above the table, say, 10 centimeters, that it would get a reward. And we hoped that just from that sparse reward signal that it would learn how to pick up the glass was a block, actually, that was trying to pick up. But what it actually learned when we came back in the morning and saw the results of the experiment was it had learned to simply flick the ball high into
the air,
out of reach and not at
all.
The solution that we want is almost about how you it kind of finds flows in the way you've defined the task of super clever.
It works out how to get around Hello and welcome to this final installment of Point of Science podcast series one.
It's been a pretty spectacular ride in which Jim and I have traveled the length and breadth of the UK in search of wisdom,
knowledge and a nice,
cozy pub.
Don't despair.
The podcast is ending,
though rejoice because today marks the beginning of point of Science Festival 2019.
Absolutely righ are events about to kick off all over the country,
with pubs and cafes hosting amazing talks from fascinating researchers covering every possible kind of science you can think of.
Make your way to point a science that co dot UK to view our entire program or pirate sites dot com.
If you're listening from further afield now,
today's episode we were lucky enough to catch up with Dr Ryan had Cell Ryan is a senior research scientist with world renowned artificial intelligence research company Deepmind,
who described their mission is being to push the boundaries of a I developing programs that can learn to solve a complex problem without needing to be taught.
How artificial intelligence is an increasingly important part of all of our lives.
And whatever your feelings on it,
it's likely it will be affecting more and more parts of our lives over the next few decades.
So we were pretty chuffed that I was up for a chat.
Originally from California,
rise undergraduate degree was in religion and philosophy,
but she made the transition to computer science a PhD level with a thesis entitled Learning Long Range Vision for Off Road Robots,
which is pretty darn cool.
She worked as a post doc at Carnegie Mellon University and as a research scientist at S.
R I International,
both in the U.
S.
Before moving to London in 2014 to join the Deepmind team as a research area fraught with science fiction myths,
not to mention quite a few anxieties around job security and the rise of the machines we decided to use today to demystify the subject and get a better insight into what day today I research actually looks like for those carrying it out.
We'll be back to check a little more about our podcast adventures after today's episode.
But for now,
enjoy the final episode.
Siri's one pint of science podcast this podcast has made possible with help from our sponsors,
brilliant or a great place to head if you want to learn something new every day but don't teach do science from the ground up by saying questions and challenges every day and then explaining the science behind them.
Brilliant dogs newsfeature daily challenges helps make learning a daily habit.
Every day they published several problems that provide a quick and fascinating view into maths,
logic,
science,
engineering or computer science.
So if you're inspired by what you hear today and want to learn to let the science behind it yourself,
check out brilliant O'Rourke or download the APP.
There's a link in the description,
and the 1st 200 people to subscribe will get 20% off the premium plan.
Let's stop by getting our audience up to speed on what is deep
mind.
So artificial intelligence is a term that's been thrown around for many decades since Marvin Minsky's For started thinking about this and others back in the sixties.
Really,
and a I was meant to be any computer program that sort of emulate.
Some of the decision process is of a human that resulted in a lot of products that were very specific,
more sort of expert decision systems that would,
for instance,
run your dishwasher and make it be more efficient.
Now we're starting to think about how can we automate that entire process of coming up with a solution and make it more general so that there's one algorithm that can be deployed and lots of different problems,
and come up with a solution so much more general in the same way that biological intelligence is general.
Learn how to optimize,
whether you're an octopus in the sea or Fisher and in sector human.
We try to learn on optimized for that environment,
so that's a little bit more what artificial general intelligence is supposed to be.
Deep mind is a It was a start up company back pre 2014 and then it was acquired by Google,
and it's now part of the alphabet sort of umbrella of different companies.
So we work a lot with Google,
but we're separate from Google.
We act a lot as an independent research institute.
We do a lot of blue skies research,
specifically trying Thio,
solve the problem of intelligence and figure out new problems to solve once we have figured out intelligence.
So the majority of stuff their research
stuff I mean, we have almost 1000 people now, just a deep mind, and I would say that we're about half core researchers, research scientists, research engineers and then half her a little bit more applied in different areas, working on different products
and does what you do to that feed into other
Google products. So some of the Applied Group works with other Google groups and problems with no YouTube recommendations. Things like this Data center optimization. The work that I do is really just in fundamental
research. So on their website, Deepmind describes company as a world leader in A I research on its application for positive impact. How does the company kind of measure? Whether they're having this positive?
Yeah, that's a really good question. What's the metric for that? I don't think we have any clear metric. It's a little bit of we know when we see it, and we hope that the rest of the world agrees with us. But we really would like to solve new big problems that the world acknowledges you need to be solved, that we don't have good solutions for so things like some of the incurable diseases that are out there. We really see artificial general intelligence as being a really powerful tool that could help with carrying some diseases, at least modeling the climate in new ways. Work that we've done recently has been on how Google data centers optimize their use of power and cooling to bring down that energy bill, which helps Google in terms of their wallet. And it also helps the world in terms of creating
more heat. You can roll that out across Morgan Gesture Google as well. You've got exactly solution. So the mine's higher profile kind of research projects on one of the ones. I think I probably came to know them. Forwards Alphago. So this is the first computer program to defeat a professional human go player. First of all, what actually is go and why. Why use Gyo instead of like, I suppose chess was the famous one that springs to mind wire wise go a superior choice.
Well,
it's just a lot harder.
So go is an ancient game,
and I am not the expert in this area.
So I will just say that it is centuries,
if not millennia old simple game,
involving putting down black or white stones and trying to capture territory.
And the interesting thing there is that there are two people playing back and forth.
If there's just one person playing,
if you tried to play,
go solitaire be really easy.
We just sort of capture all of the territory and be done because there's two player than there is this very complex strategy that happens on.
And it's a very open game and that you can play anywhere you want on this big board.
It's 19 by 19 squares,
and that creates this explosion of possible futures of different ways that thegame Congar Oh,
and I don't have at hand the relevant comparison.
Theo Intuitive example of that.
It's,
however,
many grains of If you took each grain of sand on the Earth and that was its own world far larger than the number of atoms in the universe.
I believe in terms of the number of possible game ways that the game can be played.
So it's really hard to come up with a solution if you're writing a computer program and the same thing exists.
Chess is hard to sort of automate,
hard to win with a computer program for some of the same reasons.
But go is just much more extreme.
Okay?
And so we approached this instead of starting with rules and sort of trying to program in what does,
uh,
go player?
Think about instead?
We just learned the whole thing from scratch.
Using this,
a G I approach of having a neural network that learns directly from the data that sees and the experience of playing the game.
And you ended up beating the human.
Yes,
and and so we worked on this team led by Dave Silver at Deepmind,
worked on this for years and then eventually said,
OK,
we're ready to try to beat the best player and a player in the world.
And so we went to South Korea and played against Lee.
See Doll on one match
was this made like a kind of broadcasted thing for all the deep mind stuff. You all sat there watching this kind of moment of truth.
So at four in the morning, yes. So those of us that were still in London for this we came into work. A lot of us came into work at in the middle of the night because that's when it was being played in Korea and we watched the game and we came into work because if we came into work, we could see sort of the dashboard of what Alphago was was thinking. What his expectations. We got to see a little bit more information and it was quite edge of the seat.
That is very, very exciting. Yeah, they're watching the computer like sweating, but so what, exactly does and I look like when you're viewing it on a screen.
Nowhere near is exciting is in the movies. Unfortunately, it really looks like a set of just graphs of plots over time, which reflect the A eyes expectation of winning what it thinks, its own probability of winning or losing our editing at any point. And also we see its predictions for what its opponent is going to play next. So we get to say it thinks that the best move for Lee Siegel is to play there and then we see really succeed I'll actually play. So you sort of get an idea as to How well does Alphago understand Lee Siegel on that game's correctly predicting which, of course, is a huge part of doing well in Goa or chest related gains. And then also, just how well thinks it was doing. So when we saw it start to take a nose dive in one game, Lee Siegel actually ended up winning. We knew that there was something going wrong and indeed lost,
lost his confidence, said Crazy thing. That's probably how, like a human brain also could look a representative of Italy see dollars Brain mind would definitely start. Is there so a follow up kind of success story after Alphago then? And I know this isn't your area, so we'll move for relatively fast. Alphago zero was the kind of follow up project that was even more of a big deal. Why, why was Alphago zero a bigger deal?
Well, there's Alphago zero. And then there was actually Alfa zero and they were both exciting because they removed some of the assumptions. They removed some of the engineering elements and they made it into a more of a pure learning algorithm. So the initial alphago started. It's learning process by watching a lot of expert games. A lot of human games of go played. And just as an observers watched these games going back and forth and learned from that, how to predict what was being played on, what a good next move Waas with Alphago zero. We took that away and learn directly through experience from the beginning and with Alfa zero, we took away anything that was specific to go at all. We used exactly the same algorithm pretty much exactly the same code to become super human at Chess Onda game called Show G as well.
Wow! Okay, so in terms of a non expert that it get Albergo zero is a kind of watches lots of games ago, be played, learns from them, then becomes very, very good to go. So that's that's it. Ask yourself ago. Alphago zero does it by playing against what itself just by itself, playing self play self playing on becomes incredible and then Alfa zero. And this is the one I'm finding the hottest thing in my head around. How did that then become great at three games using the same algorithm?
So that just took away anything that was sort of hard coded in that was specific to go, and it just made it more general so that it was There's sort of a list of things and in the paper, but things that were specific in terms of how the game starts, how many different moves you can you can take things like this and made it just more general were generally applicable. We always said the deep Blue was the sort of a I that IBM used to beat chess champions. But it was very specific. It was completely designed and engineered. Every part of it was written in order to play chess, right, And that same algorithm could not learn to play noughts and crosses. And so that's what we invested of a limited A I versus something that is general, something that is general should be able to learn to play any of these games. Well, and, you know, get us well, a zit do as well
as camp. So I spotted on the did the mind website when I was researching this upset. Actually, that one of a recent pieces of news is that they just developed out the star. So this is a project which is an aye aye, that can master the real time strategy game Starcraft, which I had placed on. And it's quite quite complicated. Quite hard to be good at festival is this? One of the specific builds was that this is a general algorithm that's been applied to stop crying too.
A little bit of both. Definitely. We took a lot of ah lot of inspiration from the algorithms that were used in Alfa zero in developing the Alpha Star in developing the StarCraft player. But this agent is different. It needs to. Instead of looking at sort of the positions of pieces on a board, it needs to look at the Starcraft, you know, information, which is just this screen of figures moving around over time. It's more of looking at a video game
that is harder or easier than looking at a
murder,
and it's also hard.
Probably the most important difference is that it's incomplete information.
So in in Goa and in chess,
when you look down at the board,
you see the whole thing.
It's all there with,
you know,
there's no uncertainty as to where one of those pieces are.
You know exactly where they are.
You know,
the full state of the game.
Starcraft is what we would call a partial information game so you only can see part of it.
You can't see everything that's going on across the whole board once.
You can only see where you're looking,
okay,
Plus,
there's this fog of war so you can't see some parts of the board.
So you're trying you're operating and you're making decisions from a position of uncertainty.
Fundamentally,
but humans are comfortable with.
But you guys have a harder time with
Gore chest. There's a definite end point where Starcraft is that is that
open ended. I've never played it, says good game and it's
because s so I'm guessing we're doing it for an open world thing where there isn't any sort of gold toe work towards that.
That's another problem. If you don't know what you're going to optimize, then how do you ever learn? So in a world where you're doing exploration, I think about Minecraft when you're in creative mood, I asked my son, What are you doing? I know that I can't our algorithms. We would actually have a very hard time doing anything productive or interesting in that sort of an open world where there's not a clear objective In Starcraft, you've got a winner and a loser end of the day so we can try to maximize the chance that agent is going to is going to win. But it's a hard learning process, so it is still a general algorithm, but it is different from Alfa zero. In certain ways.
I just had my first tangent alarm go off when you said about creativity, and it got me thinking about. I know that there's art that has been generated by eye before, at least like very impressive pictures that are developed completely from knows, human starting point. But I guess in that circumstance, that computer still being given an objective, even though arguably art, is about being creative for the sake of creativity. And I'm sure what people would disagree with me on various parts of that. But how? How do you program and I to produce something that's pure, created them?
Yeah, it's it's really hard to do, so there's algorithms you can use that try to mimic curiosity and that will result in more exploration of world like Minecraft and creative mode, where you're trying Thio find things that surprise you. And every time you find something that surprised, it's you that you didn't predict, the guy that's in the game every time it has something happened that didn't predict, then it will be surprised, and that gives it a little bonus happy, happy little bonus for finding anything that surprises it. And so that means you're going to naturally try to seek out new areas. I already know what's going to happen here. I know what's going to happen if I interact with this part of the game. So I'm gonna try going over here and pushing this button or stacking this block or doing whatever, whatever you do, those air not as well developed in terms of algorithms as just learning to maximize on objective, like a
win, right? Yeah. Are you like a game of outside of What is this something that you've come to you as a researcher, you suddenly found yourself interested in video games.
So no, I'm not a gamer. Although I've played Starcraft two, I think it's important for you No background research purposes. I am a puzzler. I've always done puzzles. Parents one brand new car by doing puzzles, family composition, national national puzzle competition in the U. S. When I was about 10 years old,
I thought that was my crossword puzzles. So
it was a bunch of different categories of puzzles from word puzzles, thio, more complex things. Some of them were on. We're in computer games. Some of them were not. But logic puzzles, word
puzzles. OK, it's different. It's puzzles because we are both quite deep down way. We've I mean, from my perspective, is a human, not particularly skilled video gamer. Games were hard when I was 10 and they're still quite hard now. But presumably the artificial intelligence going on behind the scenes in the gaming world has actually come really quite a long way.
I mean, I think that it has, interestingly, some a lot of these ideas about agents learning within a game existed quite awhile ago. The founder of Deep Mind is dumb asses Abbas. He was originally started a gaming company. He developed a game called Black and White, where you interact with the game and the way in which you interact with it drives what happens in the game. Every everything else is driven by that interaction. That feedback loop that happens between you're in the game to the game evolves because of how you treat it. Sort of like a dog that would, you know, treat, give treats and treat well or treat poorly. And I do play sub Nautica.
All right,
guys know that
I actually don't know e I could go on the list. A massive tangent about video games a game round. Avoid that. I'm not sure my girlfriend needs me to find another one. Think about the puzzles and stuff. Is that what pulled you into
the computer sciences?
It is,
I would say.
I mean,
my parents very much told me to maximize for that curiosity reward.
They said,
Go find the thing,
Go out into the world,
go to college,
find the thing that you know least about that is the most interesting to you and learn about that.
And I ended up with a degree in philosophy and religion,
which I'm not sure they actually predicted.
And then I was actually starting my graduate studies in philosophy.
And I said,
Wait a minute.
This is still interesting,
but I'm not curious enough that I want to spend a career in the world of philosophy.
And so I sort of retreated back to things that I knew that I was good at,
like solving puzzles,
thinking creatively,
and I learned how to decode.
I took a class in Java back in the early two thousands and then got hooked.
Got a degree in computer science,
went into the PhD in computer science,
happened to do the PhD with young racoon who just received the touring award,
or 1/3 of a touring award for his work in sort of fundamental computer science and A I.
So I learned a lot about neural networks,
and then the deep learning revolution happened,
and a lot of people became really excited in neural networks.
Deep learning thes topics
was a master's degree you got in computer science when you made that transition from philosophy Thio.
Yeah, I spent the rest of that year of philosophy study actually taking computer science classes glasses, and then I did a master's okay. And then
from there, I didn't think that would be quite inspirational for a lot of people who finished their undergrad degrees and have that panic of, like, I enjoyed that. I'm not sure I want it to be my
degree, and I think that's great. I think that doing your undergraduate degrees should be all about learning to critically read critically Think, be creative, be a problem solver. These things those apply in just about any field you can think of, and it's very easy to catch up on skills that you don't have once you really motivated Okay, Now I know why I want to learn this.
People have to choose stuff so young as well, then that's the thing that sometimes it takes a degree to be like It was like you say you studied you for last. Few more like this is still really cool. But actually, now I'm a bit older and conceal like a career. Is that what I want? I feel people have to make very big decisions at very young age. It's kind of crazy from religion and philosophy to computer science is quite a jump. But do you use any of the to the philosophy and religion? Does it come into your research
in any way? It doesn't for the most part. No. I occasionally read a book on Ai ai ai ai ethics. That's gratitude. Morons on philosophy side. And I'm happy that like Okay, I understand the way this argument is progressing. I see it is always helpful if people have a diversity of backgrounds when they get to the point of trying to solve larger problems, I think that that diversity of experience is always relevant,
helpful. You are gonna have more and more of these ethical questions cropping up. As as we can use a guy for more and more things, I suppose it will surely be useful to be able to engage with that. If you're doing it purely on a victory condition of like a I research must progress. You could lose sight of some of those Maur scary areas where I could come into play, I suppose. So. Do you think your philosophy degree might become more useful further down the line? You can engage with those debates a bit more that concerns the public have about,
I mean in my position as a research scientist. I do fun, fundamental research. I'm not. You know, I'd rather leave the Aye aye ethics, you know, for the most part, to the people who did a whole PhD in a I ethics or, you know, ethics and technology, things like this. I don't feel like I'm the expert in this field. I do think it's important to think about ethics and think about the implications of the work that we're doing and be prepared to engage with that. It's we have AH group. Deepmind called deepmind ethics in society, and they are really actively thinking ahead and thinking about areas like fairness and like bias and a sort of important subjects. I think it's really important. It's just not my my research
is there. There is a whole department. I suppose I would maybe assumed before you said that it was implicit within your role. Everyone was thinking about their, but now you say it's obvious they'll be a department dedicated to that.
Yeah, well from the beginning, when Google wanted to acquire Deepmind, Deepmind said that one of their conditions for being acquired by Google is that there be an independent ethical board. Okay, it would review things, and Google said, Oh, you're really serious about this, aren't you? Yeah. Hey, II's something that we do think will be developed and will bring a lot of questions with it. As the tools get more powerful, how are they used? What are the types of guarantees and promises that we need to make and how do we ensure that those were followed?
And how do you find working a deep mine, then? Justus? A kind of day today joke. Was it a fun place to work? Go losses?
Google's known, I think, for being the best employer in the world by whatever rating, pretty consistently and for good reason. I mean, they're really good employer. I enjoy my job because I have a lot of freedom in terms of the subjects that I want to research on DDE work with a great team of people. It's creative, It's fun. It's challenging.
We covered Alphago and we looked at ethics, neither of which are your actual area. You're actually that's Is that neural networks?
Neural networks in particular.
Deep reinforcement learning so deep reinforcement learning is when we bring together neural networks,
artificial neural networks.
Just think about it as a computer program,
where you get some input and then you have a lot of mathematical transformations of that input,
and then you haven't outputs.
It's just a big,
complex function with an input and output and based on how good or bad that output is,
whether or not it's an error,
not then you can change all of the parts of that function,
so those would be all of individual neurons.
And in the neural networks that we use,
they can have a 1,000,000 neurons,
so there's a very large number of their connections that are being adapted and changed based on the actual input and output of the network.
When we talk about deep reinforcement learning,
we're talking about using a neural network,
and the output that it creates is an action in the world.
So I can think about this.
I like the video game example.
If we want to train a neural network to play,
for instance,
Pong,
which is an old video game where all you have to do is is move a paddle,
uh,
left or right in order to hit a pole.
And so the input to the neural network is the pixels on the screen is what it looks like.
The same thing that you or I would look at the screen and the output is an action says Move left,
move right or stay in the same position so you see the pixels on the screen.
You propagate that information through this complex function with it,
which is the neural network,
and the output comes and it says,
move left.
So you move the paddle left by sending that signal to the joystick,
and eventually something happens in the game.
Either you lose a point or you gain a point.
You end the game altogether and you send back either positive or a negative feedback signal through the network.
And so you're just trying to get better and better at the game.
So use the winning or losing or the points in the game that you get as a as a reward and keep on trying to get better.
So you want to increase that reward.
And if you simply do this,
you start out with that neural network producing random outputs.
Go left,
go right,
go right,
go right left,
left,
left,
left,
left.
And it's not meaningful.
It's just random actions.
But eventually it starts to get data from that and that data those those results.
Sometimes I get a lucky hit on the ball.
It works and it learns from that and gets better.
And what we see is that within about an hour of this neural network in this reinforcement learning,
playing the game of Pong,
then it gets good enough so that its actions are almost always optimal and it can win against the computer
in this case. So this is a bit similar to before when you were saying about your agent exploring your kind of curiosity. Aye, aye, that's in that situation. You've got a little Asian exploring, and every time it discovers something novel gets its little beep of that's that's a good thing, that positive reinforcement is that the same thing? That's deep reinforcement learning but encouraging a kind of
encouraging curiosity right where now the rewards are coming because of surprises because of things that were predicted that came out differently. Onda, as opposed to that. In the simpler case of just trying to get a better score on Pong, the initial exploration is just random. And then that positive reinforcement comes
from the Bible returning.
So that's That's where my research has then for the last five years is in developing new algorithms and new ways to use these algorithms and sort of exploring what happens when we used these types of algorithms in different types of domains.
Because there's still a focus on robotics and navigation. Or is that something you left behind that kind of early stage of your career?
No eso. I worked on robotics and navigation during my PhD, and then I it's hard. Thio set down a puzzle once you've really got gotten into it. So I still work on robotics and navigation and continual learning their sort of the three
areas of research on How do you How does one use them? The neural networks in deep reinforcement learning If we're talking about navigation and mobility, which I think is what your your original thesis was actually literally on off off road robots essentially, wasn't it having navigating these complicated environment? So So it was an application of deep reinforcement, learning and everyone navigation.
Navigation is a really fun problem.
I mean,
if you can look at the whole world from up above and see the whole thing make it fully observed,
then it's not hard to plan a path from where you are,
Thio,
where you want to get to over to ST Pancreas,
right?
To catch a trade.
Um,
but we don't navigate that way.
We just see the world in front of us.
Just see the world in front of us.
We don't get to see the whole thing on dhe.
So that means that when we move through the world,
we have to explore,
we have to use our memory,
and then we need to do planning.
And so doing all of these things.
Training an agent to do these things is very interesting.
So we developed some of these different environments,
one that looks like a computer game,
but you're down in a three D maze and you're trying to find your way.
Thio gold location
over the apples.
This is the one that's exactly, and the apples air there is sort of exploration bonuses. You know you haven't found the goal yet, but you find an apple at one point, and when you actually find the goal, you get 10 points, and then you get teleported somewhere else, and you need to try to find your way back through this maze. And again, it's a it's not a hard problem if you could look at the whole thing once, but you can't. You have to take just a little bit of information into the time and turn that into sort of an internal model right internal map. There's classical approaches to this that involved building an actual map right in the computer program, things like Slam two simultaneous localization and mapping. We're interested in how you would do this if you can't write down a map. You just have to use that same neural network to somehow store the information and understand this spatial organization of the world and then use that internal representation to them. Plan a path to the goal, get better and better that
it's funny when you boil down video games to them,
just like a little rules.
I feel like you feel really proud of myself.
So you've got a problem and you've got a computer.
How do you teach one to solve the other?
Ryan knows all about the challenges we face in the developing world of machine learning,
and if you'd like to learn more as well,
brilliant dot org's a great place to start.
Brilliant Dog is a website and app,
which teaches you science from the ground up by saying daily challenges than explaining science behind them.
Every day they published challenges to provide a quick and fascinating view into maths,
logic,
science,
engineering or computer science.
Each problem provides you with skills and framework.
You need to tackle it so you learn the concepts by applying them.
They're quizzes.
If you want to learn more in a community of fellow problem solvers,
if you get stuck on dhe brilliant or half a course on machine learning.
Really think I should take that,
actually,
So if you'd like to learn more about today's topic,
there's a course that's perfect for you.
And here's something else to help your knowledge evolve.
We've put a link to bring the organ.
The episode notes this podcast in the 1st 200 people sign up through that link will get 20% off that premium plan.
That's what ask about neural networks off,
as as a concept of guests and their relationship thio actual neuron.
Are they designed to look like brains because brains are the most efficient or the only way of processing information?
Or is it Is it just that when we think of a neural network,
it's just that the brain is what we're trying to simulate and they don't necessarily look the same?
Is there any sort of connection
that we take a lot of inspiration from neuroscience? The human brain and other nervous systems in the biological world are the only examples that we have of an artificial general intelligence, so we take that inspiration, but we want to make it very efficient to run on modern computers, which means that it looks different. So there are people that study very realistic neural networks where each neuron is a spiking neural network. And there's, you know, the equivalent of Nero transmitters and different sorts of chemicals and things like this. We don't try to get to that level of fidelity. This is more of an abstraction of the processing that we think happens
roughly in the brain. I almost think there's so much confusion within neuroscience. You never speak to a neuroscientist who tends to be that confident that we know. You know, there's so much more to learn in neuroscience itself. It would almost be difficult to design a I with neuroscience is the start point because there's no, like you fully mapped the human brain, and now it's time to start emulating it. It's like we don't understand either.
Fully right. Neuroscientists are so amazingly conservative about making any claims that they really are. They don't like Thio. Put anything down concrete Lee Um, and I guess, with good reason. The brain is very complex and we have to look at it through. Neuroscientists have to look at it through these different mechanisms of, you know, the psycho physics of doing tests on how people see things, where do they look and these different sorts of things. We're looking at people with very that have had some injury to the brain. And what can we infer about things that way? Or looking, of course, at the actual brains of rats and mice. But even there, they're looking at small populations of neurons. They're not getting sort of the whole picture.
Is there any potential to do work the other way around? So is their research that focuses on how what you presumably come create in an artificial intelligence kind of neural network that could be applied to the brain in areas where neuroscientist don't quite know what's going on.
Well, we're just starting. Thio have research that can feed back. I think I've done work in navigation with representations that look like grid cells, which exist in the brain and for navigation in rodents, seeing things that look just like grid cells and act like Britt cells inside of the neural networks that we're training. And so if we train them in a particular way, that's in some ways similar to how a mouse learns to navigate around the world. So that was a paper that was published in nature year before last. And the neuroscience computer community was really excited about about this result, I think, because it gave some external validation of different hypotheses they had. And also it did open up new avenues for exploration. New avenues for research
When I was looking at your M Ted X Extra talk, actually, I had another thought about how potentially we could learn from a I a supposed to the other way around. You showed that clip of, and we'll put this on the block so people know what I'm talking about. There's a really hilarious clip of a sort of artificial intelligence agent identified. We define agent, but the agent is essentially the the eyes kind of manifestation within this video
game world. I mean, it's that neural network that produces actions, and we call it the Agent because it has autonomy. It's not just a program, it has autonomy. It makes decisions. And so agent is the common way that's
referred to. So, yeah, you showed this clip off this agent, which was controlling humanoid figure. It looks like a man who is moving in the most hilarious and bizarre way kind of using its arms to stabilize its movement. I think the purpose of the A I was to see if it could learn how to use a human body and how it could learn some normal gate on move through an environment using all the same human joints, because obviously the body is extremely complex. Now that looks kind of ridiculous. But as you say in your talk, it comes from a place of efficiency. So it's waving its arms about like crazy to kind of stabilize its body. Actually, is it possible that, you know, we, as humans are the ones who actually aren't moving is optimal ease as we could if sports people were moving at full efficiency? Is it possible maybe moving like that?
Well, Theo, the agent moves that way because that's all it has ever needed to do is to run through these obstacle courses that we've created in this little simulated environment, and that's not a lot of variety compared with what we have to d'oh Plus, that's not a real human body. By any stretch, it's a gross simplification
23 joints.
And so it's not particularly trying to conserve energy, for instance, it can use as much energy as we want. We're not going to throw out our arms with each step we take, because we don't need Thio in order to maintain balance. And people would laugh at the's a powerful motive. So we, you know, even a professional sports player, they're still going to do a lot of other things with with their body and so that sort of acts to regularize how their actions are etcetera. But you do see that people who are incredibly good at a sport will optimize to the point of where it becomes something that's quite extreme.
I would think that probably like to some extent I research would we only better me but would play into those very high level sports people. Surely part of the research that goes into deciding how you say involved should like, launched himself off. The start block is the potential for the use of a I in that
kind of field. Well, I think that our models of the human body would need to get much, much closer to the actual human body and the constraints that we have before because right now this is just, you know, there's no penalty for overextending. You're joined. So things like that there's not damaged, there's not pain and there's not energy conservation. But if we took this to a point where the models that the agent were controlling, we're a lot closer to real human bodies, then maybe I mean in the case of Alphago than the games that have been played by by Alphago have really inspired the go professionals and the same thing in chess. People look at these games and they say, Ah, here is you know, this super intelligence player that is managing to be creative and do different moves that we never thought about on. That's been sort of some of the excitement over that line of research.
So what other kind of key areas that we're seeing? A. I applied in them in our normal lives, our day to day lives.
It all depends on how you define a I. I think that even my refrigerator claims that it's a I e. I think that more realistically I think that anything that involves media, we're seeing change very quickly in terms of understanding photographs, understanding video coming up with you know, something that will summarize your video for you give you a running caption of what's happening in a video, for instance, and those are things that we definitely could not do before. We had no very large neural networks that could learn to solve these important tasks, like captioning, your videos sort of more serious cases. Then we're starting to see changes in the medical world in terms of being able to automatically analyze and interpret medical scans on. This is really important for places where they're only a few doctors in the field or in areas of the world where those doctors air, not not very available or simply to come up with an initial assessment of of a condition.
Doctors must have such massive fatigue if they're under a huge work clothes looking at like M r. I scan for M R. I scan. If you could have a machine that had learned to spot the early signs of like a cloud on belong or a tumor in the brain, you could kind of have that first sitting stage way. Talk to see Black about when I think about it Really interesting chat with a forensic anthropologist in Episode five podcast where she talked about how police have Thio examine really horrendous footage. A lot of like these awful crimes, but they're hoping to develop software that can spot clues that might lead to identification of a suspect on, eh? I could do that kind of trolling stage looking through that kind of quite disturbing footage. And then you could bring in a person to validate what the eye spots. A lot of knowledge and a lot of expertise yours included, goes into kind of creating these machines that can learn extremely efficiently in some of that. Learning in how to create a very efficient teaching method cannot be put into practice in the non aye aye spear. So as faras kind of just teaching goes, for example, do you think lessons learned from the world of like programming these deep neural networks could be put into use in the world of teaching?
You know, there's been so many people that have thought about things in this direction in terms of individualizing personalizing education per student, because all students are going to learn differently and having a sort of a curated curriculum for each individual student could be really valuable. I think that that's certainly possible. I also think that the rule of human teachers is is very, very important. I mean, you say that a eyes can learn more efficiently, but that's not really true. They're actually quite inefficient learners in some ways, but they can learn in very focused ways on a lot of data, and they don't need to sleep or eat. But humans are much better at learning more efficient at I'm learning a curriculum of knowledge, going through a math course and learning the fundamentals and applying those and then learning more things and learning tools and putting it all together. We don't have machines that can learn that way.
What I like about reading about machine learning research is that you boil down what teaching and learning and task solving is to its most fundamental components on its rays. Actual testament to how you are on this podcast. You very good at explaining things very clearly. No, no. Everyone has that ability on get something that, like I feel there's something in that that could be like used in training off teachers, for example. You know that that's sort of like what is knowledge? What is the most effective way Thio conveyor concepts or help someone complete a task? I suppose it's very wishy washy. I think it's I think
it's a great idea. I can't think of anything that uses that premise, but I don't see why it shouldn't hold. In some cases, there could be something that's that's quite interesting.
Lead the way on this, actually lead on to my next question, which is, Do you find yourself as you are? You know, your day to day job is all about kind of teaching machines how to perform tasks. So do you find yourself when you're like, you know, teaching your Children? Did you find yourself being excessively official?
All right, Well, their nickname for me is robo
mom, so I think
that's more about I manage it a team deep mind. And if I come home and probably there's many managers out there in other fields that also, when they come home to their families, they have toe take off the manager hat and put on the parent hat. I don't know, I sometimes we certainly have very interesting discussions at home about, For instance, the ways in which my agents will cheat
when I try to teach them a problem find the most like, easy way of doing the task
you mean exactly. So we were trying to teach an agent to navigate through the streets of New York on if you've been to New York, he may have noticed that almost every street is a one way street. Some avenues that go both ways. But for the most part, they are one way streets, and this was using street view. You know, Google Street View is so that you've used it. So you get sort of this first person view of the world as if you're down on the street. So we trained an agent with this view of the world to follow directions so we would feed in the text instruction, Turn left, turn right, go straight. These sorts of simple directions and hope that learned how to interpret that that language and turn either way. And it did really well in the task. Then we discovered that it was cheating on you. Guess how it was cheating
down the wrong way down a one way street.
Well, we were telling it to follow directions that we got from Google Maps and Google Maps knows that you're not supposed to turn the wrong way on a one way street. So basically the agent, rather than interpreting the text instructions, the language instructions just when it got to the next intersection, it would go whichever way the traffic was going. Just play, follow the leader and in fact it by mistake. It went the wrong direction and had traffic coming at it. Then it would say Oops
and the turnaround Does it work? You figured
out when we finally tried to debug what was going on with this agent and we gave directions that were specifically going against traffic on one way streets and wouldn't
I?
I'm not sure if that's really smart or really stupid.
I read about a similar one recently with the Huskies.
There was a and I designed to spot.
It was fed lots of images of Husky dogs,
and then it was like put toe work identifying Huskies versus wolves.
I think it was it waas,
right?
Most of the time,
except on this one image.
We're looking at this image worldwide wife.
Could it not work out this fun?
And it was because there was snow in the background of the shop and it had basically,
rather than learning a lot of its rules from,
like the cues in the face facial structure of a husky best civil.
It just noticed that a lot of Huskies have snow in the background.
So this one wolf,
they have snow in the background.
The I I've been like that since he was a light.
No rule that efficient.
But it's almost about how you it kind of finds flows in the way you've defined the task of super clever.
It works out how to get around like I think
it comes from having multiple objectives. If you give these really sparse reward functions where I'm just going to give you a point. If you do exactly this, then there's lots of different ways to interpret that, you know, even just saying Tell me when you see a Husky, tell me when you see a wolf. We know what those words mean. But in a machine, a program that's just looking at images is going to say, Well, I'm just gonna look at the statistics of this Try to tear it up unless you tell me something else about
it goes back to the abstract knowledge that people have read it. I was gonna ask if you ever like anthropomorphizing your agents, do you get, like, particularly attached Teoh individual ones? Or is there even individual
agents?
I once did an interview with a master student who was studying official intelligence and came from my college that I went to that said that I would do an interview with her.
And the first question she asked me is,
What is the name of your huh?
What is the name of your A II and his female or male?
Really?
Do not understand how well,
I would say that way.
Do you see a lot of behavior is one of the most interesting things is understanding.
How are these agents solving these different problems?
One of the strategies that they have,
what are they seeing that I'm maybe not seeing?
What are the ways in which they are finding tricks and what sorts of interesting behaviors emerge?
How does an agent use a neural network as a memory bank to solve a navigation test things like this,
and sometimes we're mistaken.
We think,
Oh,
this agent is really small smart.
It's really solved the problem and it hasn't.
And we find that out through doing careful analysis,
additional experiments ablation sze of experiments things like this in order to really get it.
What is
What's that? Suppose otherwise you got that sort of confirmation bias thing going on. There absolutely
still have to be good scientists
like you. Probably naming it probably isn't a good idea How much of your so code do do share a lot of it? Or is there stuff that you you know, the stuff that so proprietary wanna keep yourself?
It's when we develop things when we write code in Google than it's hard not to use a lot of different Google tools that air proprietary. So for us, open sourcing is harder than if you're an academic department and everything that you're using is already open source. But we open source things that we see that there's a lot of value to try toe. We try to make libraries public to try to make environments public. So the street view based environment that I mentioned for navigating that's an open source environment made public
because this must make progress so fast when it's computers on code, you could presumably have everyone across the world's literally publishing exactly what they did to the point that someone else could pick it up
and run with it. And that's also a good way to reproduce bugs and half those. So if if I have results on my code and then I published my code and you run that code and you get the same result, all it shows is that we ran the same code. It doesn't mean that the underlying algorithm is, for instance, bug free, that there are
issues. What is that about? The review process
is for? Well, yes, but there can be things. There's two different things that you want to do. You want to open your code so that people can build on it quickly so that, as a community were sort of growing scaling. But you also want other labs to try to reproduce, because you could get a lot of information out of that by having that that disagreement between results when you thought that you had the same implementation. Yeah, you learn
from that it's frustrating exactly about what
we learned from that. So there's two different things there that are not always. It's
not like one is purely
usually we find that we publish a paper that the community is excited about. Then there will be regardless of whether or not we put code out there. There will be an implementation of it on get hub in a matter of weeks, and sometimes that would be a better implementation. That then what we have get Hub is just a big repositories of code, all right, can you can contribute to projects with 1000 different coders contributing to them or just have your own individual code up there? But it's just a way to share code.
Sure. Now you say at the start of your techs talk that we're actually quite a long way from producing human level intelligence at the stage where which is good to see people being honest, as opposed to, like, headline worthy stuff. But I mean, how what other kind of major obstacles do you think at this stage to producing that human level intelligence? If we're looking at a long game over the next, I don't know Well, first of all, how many years do you think it will be
going?
T o eyes, that even a desirable thing to
do,
right?
There's a lot of discussion about whether or not we're trying to make tools to solve problems that humans can't solve or whether we're trying to replicate human intelligence.
And I tend to think we're trying to.
It's the former,
not the latter.
We're trying to develop a I as a general problem solving approach,
so that then we can push it into the problem of modeling the climate,
give it lots of data,
let it chew on that data for a long time and then come out with a better model than we can with other methods.
Same thing for a disease.
Lots of data on this disease,
but no real understanding of it.
Let's see if this approach will help,
but on the way we're inspired a lot by human intelligence,
so we use that there are a lot of obstacles to getting anywhere close to human intelligence.
Humans are amazingly powerful.
Computing machines that have long term memory have language,
have abstractions,
have concepts,
have a lot of things that we have not yet come close to cracking.
I'm more think about mouse level,
you know,
mouse level A.
I is
something that we well, that's interesting. I wrote down next to one of the answers you were saying earlier. I animal modeling because obviously in the world of biomedical, it's kind of animal. Research is not the most desirable way of doing things, but it's the the best way we can make rapid advancements in medicine. But everyone always says that one day maybe you could simulate a mouse, and all of these kind of like early stage, you know, experiments that lead clinical trials could be done on a I. But generally, even when I ask people about that, the feeling is that we're a long way off my little guy. So we hear a lot about the Successories. May I? What sort of challenges do you think guys the least suited to addressing? So it where where does that human abstract, high level concept thinking make it really difficult to apply?
Right?
I see that when we get to humans that are making decisions where they're taking in lots of abstract,
vague may be inconsistent information and trying to come up with a,
say,
a policy.
It's not a black or white decision,
but they're trying to come up with something more nuanced and this sort of accumulation of information,
something about what a politician does.
Oddly enough,
a politician looks at these different populations these different groups,
and tries to come up with consensus,
tries to build consensus in terms of producing legislation or or something and that sort of bringing together all of these different types of information.
They might have individual information from meeting with individual people that they know my name.
Then they've got statistics over many people.
They've got sort of groups that they have worked with on,
and they have their own background,
their own history,
and they need to somehow bring that together and then produce a convincing argument right on and build consensus with other people.
And that's the sort of complex,
human focused,
very rich,
uh,
sort of thinking and behavior that would be I wouldn't have any clue where to start.
I think that that were a very,
very long
way from It's almost like the government as a whole. Works is a giant here on that way,
he's gonna think about this and individual things that we do that involve a lot of a person interaction and a lot of bringing together uncertain information and then trying to come up with something that's not just a black and white decision but is more of a and I'm gonna now behave in this
way. Yes, yeah, it's hard to do. It's a very complex victory. Condition that one isn't He did have the time. We can ask about caskets. Have start forgetting. I was worried that would be something you'd be like That's 1/2 hour about
So sometimes people will ask me because I'll say,
Well,
I have research in navigation and robotics and in these different areas,
including continual learning and they'll say,
Well,
what's the most important thing to solve?
And I actually done to say continual learning.
This is the problem of learning across a lifetime of experience,
having the knowledge grow over time.
We start out as a baby.
We learn about many,
many different things.
We end of somebody who has,
hopefully wisdom,
a lot of capabilities,
and that's very,
very far away from what we can do right now with neural networks,
which still tends to involve take a big data set sample from that data set and learn from it.
That process of learning,
sort of all at once from a data set,
is completely different to how we learn.
We read a book from beginning to end.
We don't sample random pages until we know the whole thing.
So not only is continual learning something that's hard for neural networks and for a I systems in general,
but I think that it's it's absolutely critical that we be able to learn in stages and be able to grow the knowledge and the capabilities of a system.
If we could do that,
we could solve begin to solve other problems because then we could craft the curriculum.
We could craft the school that would educate that a I to start with small problems and then grow towards larger problems.
Why can't computers hold that information? Why do computers forget?
Well, it's it's called catastrophic forgetting, and what we see is just that, uh, learning tends to involve changing just a little bit every neuron. So you've got a 1,000,000 neurons in a network, and I learned one thing, and then I want to learn a second thing after that, and that learning process changes just a little bit every neuron in that network. And so when you do that, you end up not being able to do the first task anymore. And neuroscientists, you know that humans and animals do a much better job solving this problem. They do things like reduced the plasticity of different parts of the brain, so that learning one thing doesn't wash out. Another thing s o. We have algorithms that we've developed that are, take some inspiration from that process of reducing plasticity after you've learned something so that you sort of fixes in place. But they don't come anywhere close to type of abilities that humans show.
It's called catastrophic because it means that it's
because you really drop off a cliff. Because if you're still trying to say, How well are you still doing on this first test that you learned as you're learning the 2nd 1 then you just very quickly fall off a cliff just cannot even interpret the game anymore. The task or whatever
it is. It's funny, very different from how for humans. All of this stuff does make me kind of marvel of people at the end of it all. It's like just then talking about the way humans remember stuff I remember from my you remember weird things from your childhood and you certain skills seemed to stick that come into play it later times and
yeah, exactly. That's the other side of it. You don't want a catastrophically forget the things that you learn. You actually want to be able to transfer it forward, to be able to take skill A and skill B and put it together to solve a new problem. And we also have have a hard time developing networks that can do that
as fluently.
That was amazing.
Thank you so much.
And you've been upset.
Brilliant.
We got there.
There were a few questions at the end there that I could have kept on China last spring.
New,
fresh 2019.
Safe back and stick it in the middle of the mind up.
Sis,
this podcast is made possible by brilliant daughter a great resource if you want to learn something new every day brilliant or KTTV science from the ground up.
By setting questions and challenges every day and explaining the science behind them.
Brilliant dogs Newsfeature Daily challenges Makes learning a daily habit Every day.
They published several problems that provide a quick and fascinating view into maths,
logic,
signs,
engineering or computer signs on brilliant or have a course of machine learning.
So if you'd like to learn more about today's topic,
there's a course that's perfect view.
You're bound to love it,
especially since using the link in the podcast description will get the 1st 200 uses 20% off their premium plan.
That concludes the final episode off Point of Science podcast on Dhe.
We'd like to now introduce you to someone very special his hat made.
The podcast is kind of exactly he's he's played a big role.
So introducing our producer,
Sam Data,
Ugo Sammy's gonna jump in sooner.
But then you have more nice things to say to stop making you feel uncomfortable with compliments.
Let's instead of under thanking someone who is now gone,
which is,
of course,
Ryan had cell.
She had a lot of good stuff.
Yeah,
that was really incredibly interesting and basically the same as every episode we could have carried on talking for so much longer exactly.
Especially in terms of A I is gonna be something that's gonna change the world in probably unimaginable ways.
It's interesting to know that someone working at very senior levels of a I research isn't actually a tall taken by that cycle I climate,
but so unbelievably able to communicate it to three people who know absolutely nothing.
No offense,
guys projecting three guys you know absolutely nothing about.
Here's somebody right in the core of it.
There was able to just explain.
Okay,
well,
this is what we're doing,
and this is why we're doing it and made it some accessible.
Yeah,
it's science communication that Sheldon listening back to this royal blushes.
Well,
yeah,
yeah,
I do feel thoroughly,
like decide if I'd Every time I watch C s,
I know I'm gonna be like,
bullshit,
bullshit,
bullshit.
So I supposed todo before today five should have hammered that one every time they say,
enhance totally legit.
Another week,
thank you is,
of course,
due to the pub that we're currently sat in the Lincoln Lounge,
which is probably the most atmospheric looking pop we've been in since back in war offend every other public Wait a second wait up there in the league tables of beautiful public winning.
This is,
I would say,
the top of the league automatic promotion territory like United.
Thanks very much.
So,
I mean,
I just wanted to quickly,
probably from a purely selfish perspective,
but revisit some of the highlights of this Siri's as it is the end of the series.
So,
yeah,
I mean,
Rex out this podcast,
I said,
We've been on the search for some of the best researchers in the U.
K.
And some of the loveliest pubs,
and we have been successful on both counts.
If we just hark back to that very first episode of Steve Take in Episode one,
we learn about science behind Olympic gold medals In Episode two we learned about the neuro science of Love.
Episode three.
How to Build Your Own Rift Basin in a kitchen That was a good one.
I really try together.
Episode four Theories as to what killed off those pesky dinosaurs to my but most regret and sorry too soon in episode,
Fire loves five.
We let the way that we can learn about life from the bodies of the day we did Yes.
See Black Episode six honey And it'll smell a question.
Weep themselves every day.
Episode seven Of course.
We learned about the unusual on Dhe.
Frankly,
slightly disgusting.
World off fecal transplants.
In Episode eight we learned about a ll the energy sources of the future.
Episode nine is if we didn't know them already.
But the simple rules of physics that govern all of life and existence on dhe we've enjoyed them all over some delicious points has been pretty,
pretty pointy.
It's been a special special podcast,
guys.
Now,
if you haven't already heard all these episodes,
the first thing I would say it is go back one,
start listening as you go through them.
I would like you to tell everyone you know the end of each episode to go out there,
use the hashtag pointcast 19 on your social media.
Tell your friends and family who you just know love.
Point of science.
Already.
That podcast offers another way to access that fantastic sci fi knowledge on goodness also,
of course,
as we discussed it start,
we are now on today,
one off the festival s.
So if you have any plans for this evening tomorrow evening or Wednesday evening.
Get yourself down to your local well,
get on point.
Science tuckered out UK and just double check way almost certainly will be,
though get yourself down to an event on.
You can experience what we get to experience on this podcast for yourself.
Being in the pub with great researches on Great,
There are great beer.
Exactly.
It's about the last for 2019.
You want memories more permanent.
By the way,
here's an idea which will help you sell the podcast.
Your friends as well.
Get yourself a cheap white T shirts and a Sharpie.
Permanent markers are available.
Just write down your favorite thing about every episode on teacher stress.
Enough,
you know,
teacher hold still basin and people had no T shirts Where now in public show all your friends you need to talk Thio This'd Why he's good about you.
What you think you you I couldn't even presenting.
Okay,
thank you very much.
Kind of science podcast fanspoint of science fans.