IBM: Accelerating Discovery to Solve Big Challenges

Summary Speakers Transcript

Dario Gil, SVP and Director, IBM Research

Good morning, good afternoon and good evening.
Dario Gil, director of IBM Research and it's
really great to be with all of you today.
You know I was thinking about last year when I
got a chance like so many of you to be
there in person.
I remember thinking I really love you know the buzz
and the energy of this event.
And of course we're all painfully aware that COVID-19 has
changed all of that now.
I think we're all excited about that.
Fact that now we have several vaccines and with them
hope.
At the race to develop a new vaccine is really
a testament to the fact that the urgency of science
has never been greater.
It's not just about the current pandemic which you will
inevitably end,
but there are and there will be other global challenges
from possibly other pandemics.
We may confront the future to climate change to food
shortages and energy security,
and what we really need to do is to accelerate
the rate of discovery.
To solve some of the world's most pressing problems and
the good news is that we now have the technologies
and the width to address challenges of this scale better
than ever before. I'm going to explore some of these
technologies as we go along the world of artificial intelligence
and quantum computing and high performance computing and hybrid cloud.
Using them, we can really and radically change the process
of how we conduct discovery.
We can accelerate and supercharge the traditional scientific method.
And just think for a minute about how much we
have relied throughout our history on serendipity.
To take the example of materials and materials discovery,
this is how we discover Teflon and Velcro and Vaseline,
and more recently graphing if not for happy accidents.
Materials are typically discovered by trial and error.
It's a slow, linear, an iterative process of mixing different
compounds and testing the result.
So you know this is not new.
This is how the ancient Greeks learn how to make
concrete.
As an example, they were mixing lime and water and
pieces of broken pottery.
But of course we know also that with time this
process has become more scientific.
You ask a question, you then do some research and
form a hypothesis.
Next, you run some experiments and you test it,
and during that time you systematically observe,
measure and replicate the results.
If needed you can modify the hypothesis and repeat the
process again and again and again.
This traditional approaches we all know expensive and typically takes
many many years.
Now, this approach has been evolving.
Of course, science has been evolving to from empirical science
all about observing nature and measuring to theoretical science where
researchers come up with theories and use observations to validate
or refute hypothesis. Then from about the 1950s,
with the advent of computers,
we really saw a boost to the scientific method.
So continuous advances in high performance computing have enabled modeling
and simulation of complex molecules to keep with the theme
of materials.
And that allowed us to speed up the cycles of
hypothesis and testing and leading to today's big data driven
science.
But still, even with high performance computing and the traditional
scientific method is really slow and it's a challenge when
we're confronting some of these pressing challenges that we confront
the society. So, well, you know.
Now enter the world of AI an I'm sure I'm
speaking for the entire tech community.
When I say that today's AI is becoming increasingly powerful,
it's really enabling unprecedented levels of speed and automation and
scale,
helping us solve ever more complex problems.
And it is a guy that can help us usher
this new era of accelerated discovery.
It can help us supercharge the scientific method to turn
it from a linear process into a closed loop.
What do I mean by a closed loop?
Well, imagine that we want to create a new plastic
with specific properties extremely inflexible but also lightweight Ann.
Able to fall apart into its original monomers for recycling
so we can have a full life cycle.
First we outline these properties.
Then we can use AI to sift through the existing
knowledge on polymer manufacturing to see all the previous research
and patents and fabrication attempts and form a knowledge base.
The next steps, high performance computers,
or in the future quantum computers would conduct simulations to
augment the knowledge base with important data on molecular properties.
Then generative models in a I would identify knowledge gaps
and propose possible candidate molecules for these new polymer.
In other words, they're going to help us generate new
hypothesis.
Relying on a desired property of not yet existing material
machine generates the chemical composition and structure of candidate molecules
based on data and examples of past chemical reactions and
the process will comma late culminate with with having new
knowledge that invokes a new question and the loop starts
again.
So this would be a continual loop of discovery,
increasingly automated and increasingly autonomous.
I'm going to get back to this in a little
bit later in more detail.
And we already have powerful technologies to make this a
reality.
To accelerate this process. Supercomputers,
quantum computers and AI systems and tying it all together?
The hybrid cloud? It's really remarkable if we reflect back
how much computers the world of bids have advanced in
60 years.
We now have high performance processors like dozing or Z15
mainframe systems that Ibn bills that have over 9 billion
transistors in a single processor and just remember that each
one of these transistors is about the quarter the size
of a human virus.
Now you can build these machines that can have 7
nines of reliability.
So what that means in practice is that you can
have an average unplanned.
Downtime of a system like this of just three seconds
per year and a single machine,
for example, can process a trillion web transactions in a
day or super computers like the ones we design and
build to equip the national laboratories of the United States.
So if you take IBM Summit,
for example, with which recites at the Oak Ridge National
Lab,
it is capable of processing 200,000 trillion calculations per second.
Think about that number there,
just mind blowing, and these are some of the most
reliable,
scalable and secure computing systems.
On the planet, and we can put them to use
to help us accelerate how we do science.
But even with all their might got problems that supercomputers
simply cannot solve because the number of possibilities they need
to explore to find the solution grows exponentially with the
size of the problem. Quantum computers will change this,
they offer a powerful alternative because they combine physics with
information to compute in a fundamentally different way.
Quantum computers rely on principles of quantum mechanics.
Quantum algorithms in code, the problem into entangled superpositions of
all possible states of each quantum bits,
or qubits, and they are they beating heart of a
quantum computer.
They use quantum operations to interfere all those States and
increase the probability of the ones that contain the solution
through what's called constructive interference.
This way the system gives the correct answer faster.
And more accurately, than a classical computer ever could.
We expect quantum computers to answer technologically relevant questions that
are simply out of reach of the supercomputers I was
just speaking about an They are going to accelerate some
of the most challenge challenging future computational tasks.
For example, they should be able to simulate,
say, new molecules that classical computers never could.
Now quantum computers would work together with AI and we
need AI to handle bigger and bigger amounts of data.
Inspired by the world of neurons and combining the fields
of biology with information,
we now have new models of computation.
We know that they are based on neural Nets that
are highly integrated into today's state of the art computers,
particularly accelerators. Lately, they've been impressive advances in the capability
of AI systems to learn from large labeled datasets to
interpret and to analyze data to perform cognitive tasks.
And in the computer efficiency of a hardware we've seen
major advances.
So, for example, trading numerical precision for computational efficiency,
we now have proven that we can train deep neural
networks with only four bits of precision.
This could help boost the efficiency of training systems by
more than seven times over the best commercially available systems
today,
cutting energy and cost and paving the way to bring
in training closer to the edge.
And this will be a major advancement for privacy and
the security of AI models.
And we've also been working on improving their performance with
analog AI hardware,
where computer memory are combined in a single location to
eliminate the movement of data between logic and storage.
What's known as the Von Neumann bottleneck?
Now these types of algorithms and hardware innovations will allow
us to create,
deploy, and scale increasingly complex models and networks.
Which brings us to have a cloud away to deploy
these technologies and to make them available across the globe
and across institutions of flexible hybrid cloud platform is critical
to support experimentation. At scale.
Only then will be able to run workloads in the
best suited hardware.
Beat classical computers, quantum computers or purpose built AI hardware.
Having cloud with unify these technologies to give us virtually
limitless pools of computing power and capability with secure use
and sharing the private and public systems and data.
So quantum computing, AI, high performance computing,
and hybrid cloud. These four next generation technologies are bound
to supercharge.
The scientific method and to enable something we call accelerated
discovery.
This new method should give us the discovery driven enterprise.
19 now we know we've all been working really hard
to enable the data and AI driven enterprises.
And over the years growing vast volumes of data have
allowed businesses to produce consumer LED innovations.
This need to deal with the deluge of data has
driven huge advances in AI and automation that we're all
witnessing now is the time for the discovery driven enterprise.
On top of these advances.
One equipped with the platforms and tools to allow it
to handle increasingly more complex,
flexible cloud workloads and to adapt it to a changing
environment.
I mean, it should be self evident that if you're
dealing with something like a pandemic,
the future is not about just analyzing the past,
it's about discovering what is coming next in a change
environment.
So that's this era of discovery,
so we're not talking here about a distant future is
something that is happening right now.
So let me show you accelerated discovery in action.
Considered the world of chemistry,
it's vital as we all know,
for virtually almost every industry for manufacturing,
for medicine, for anything where you need to create new
materials and is really crucial to develop technology responsibly and
sustainably,
well, if we want sustainable products and solutions for a
sustainable planet,
we need to start from sustainable advanced materials.
So take microelectronics, you know the world semiconductors for an
example.
They taught every aspect of our lives,
right? I mean, you know,
what do I need to say to these?
Audience right? Or phones or watches or cars?
Even our homes have been turning to computing platforms and
we need to ensure that not only their energy consumption
is sustainable but also the materials and processes that we
used to build them. At IBM research,
we aim to develop sustainable materials for microelectronics while maintaining
or even improving their performance.
And today we make computer chips using UV light to
create 3D patterns in a photosensitive material called the photo
resist.
Which is a crucial enabler of computer chip manufacturing,
and that pattern defines the transistors and the interconnecting wires
and make our chips.
Now early photo resist materials used incoming light to do
chemistry directly by creating or breaking chemical bonds.
And in early 1980s, IBM researchers invented a new type
of photo resist material at chemically amplified,
further assist that dramatically increases the resolution and detail of
the pattern.
That's a key enabler of Moore's law.
Now the key component in chemically amplified forests is something
called a photoacid generator,
or PAG. It's this photo.
Wrists that harness light and translated into the two D
optical image and make a 3D pattern that is the
basis of our chips and this is you know how
we make all our semiconductors.
So this is an innovation that has for decades allowed
or chips to keep shrinking while doubling the number of
transistors according to the density scaling component of Moore's Law.
And that's what makes her gadget slimmer and more powerful.
Now consider that on average it takes about 10 years
to discover a new material and to bring it to
market.
At estimated production cost of at least $10,000,000 he can
raise from 10 to 100 million dollars.
Now, for example, it took 10 years from the beginning
of research to the discovery on 1st use of nylon
in a toothbrush.
Now, with new accelerated Discovery approach,
we could cut it down to one year and $1,000,000.
We want to cut the cost down by 90%,
so let's look at the process of accelerate discovery for
new materials and let's bring it to life.
Now, with the discovery, with the with the approach of
accelerate discovery,
what we're hoping is to create new materials like the
photo acid generators like the packs that I was just
describing.
More efficiently and sustainably than currently done,
it's really about complementing and scaling human expertise.
So what we're really going to do is to help
my colleague Tio Latino.
You know, that is shown here in the picture.
So Rick, all this supercharged scientific method I outlined before.
So let's break it down into components of our materials
discovery workflow.
It integrates deep search. Aion rich simulation.
Novel generative models to automatically create hypothesis.
An automatic experimentation using robotic labs.
So first or AI software called deep search shifts through
existing literature using deep search.
Typically speeds up the process by 1000 times as the
AI can ingest and process about 20 pages per second
per processing core.
Human readers of technical literature,
on the other hand, typically need between one and two
minutes per page.
Then alien rich simulations with screen experimental parameters at least
twice as fast as the regular simulations and up to
40 times faster.
Next AI generative models fill in the gaps.
We have seen a 10X acceleration using generative models to
identify gaps and create materials concepts for test.
And finally, we would test the results in an AI
driven lab achieving 100 X faster synthesis than using traditional
methods.
Each of these steps is powered by the latest technology,
so let's take a little bit deeper in how accelerate
discovery process uses these components.
Deep Search Android simulation, generative models and cloud based autonomous
labs to solve a complex materials discovery problem.
In this case, we're going to use it to synthesize
a new pack,
and you photoacid generator. Generator that is the basis of
semiconductor manufacturing.
So the traditional way of discovering a new molecule would
go like this scientist would search the published literature and
use what they could find plus their own knowledge to
design A molecule and target the properties needed.
They would then go through iterative cycles or synthesis characterization
and testing until they reach a satisfactory compound that is
an extremely long process.
Even using computers to run advanced simulations,
the overall process is long and slow and expensive,
and not just because solving chemistries in itself hard,
but also because the set of all possible molecules in
chemical compounds is astronomically large.
You're looking here at the chemical space representations 4 packs.
Every dot represents a single molecule,
and their grouping color clusters where each color indicates distinct
classes.
An families of tags. Another way to represent this information
is using a circular dendrogram.
Where the packs families are sorted into a phylogentic tree
by chemical similarity.
So first deep search, the AI will collect what is
known about the packs.
In general, the amount of documented knowledge is too large
and growing too fast to be handled by humans.
So to put in perspective,
in 2018 alone more than two million signs papers were
published.
Or new technology. The IBM Corpus conversion Service can ingest
the full PAG corpus of about 60,000 pages in less
than an hour on a multicore server.
The information was organizing a knowledge graph with 2.2 million
nodes and 38 million edges of known materials.
So look at the dendrogram where the packs families are
sorted by Kim alert by chemical similarity.
So notice for a minute.
If you look at at that little figure,
all the empty space around the outside ring.
That is because important data on the properties of packs
for most of the compounds of interest was completely absent
from the literature from some properties,
the available data is so sparse or noisy and unreliable
that it was almost useless.
We have to augment this data set with enough data
on predicted properties to train an AI model.
Here we use AI and rich simulation to provide quantitative
values for important properties for the packs in the data
set.
Purple shows the computed values for one property as associated
with toxicity.
Here label LD 50 in these known pack families,
blue shows the computed values for one property associated with
environmental persistence biodegradability.
The outer ring with the green data shows the computed
values for one property associated with packed photochemistry,
Lambda Max, the wavelength at which the material has its
strongest photon,
or light absorption. Once we have the resulting augmented data
set,
we built an AI model to predict thousands of potential
packs with better sustainability attributes.
You can see how some of the output of the
generative model fits into the pack design space in the
dendrogram.
The white elements of the inner ring are novel photoacid
generator structures created by AI by the generative model.
Let's step back for a moment and take a closer
look at the power of generative models.
They're moving AI from serving simply as a classifier,
as a discriminator, to becoming a generator.
This creative, discriminative models just classified data,
neural networks and deep learning have brought important new capabilities
for tasks that require discrimination,
like image recognition or natural language processing and speech transcription.
But generative modeling goes beyond discrimination.
It also it's already starting to produce impressive results at
scale.
What generative models do is they create.
Deep generative models were used to create the first AI
generated portrait that sold at Christie's last year.
As an example, there are being used to create extremely
realistic faces of people that don't exist.
Pre train Transformers combined with deep generative models are used
to automatically generate natural language text.
One recent example is the GT3 model.
When you know just giving one sentence prompt,
he can generate 200 to 500 word essays that some
people cannot discern whether it was generated by a computer
or written by a human.
Now from the new molecules to creating stories and arguments
and even images of people,
places and objects that have never existed,
and even generating new software,
new code that is the power of generative models,
they can provide a powerful new tool to search for
candidate materials and generate hypothesis that expand both the discovery
space and the creativity of scientists.
Used in material design, AI generative models.
Learn to represent materials, data to generate a hypothesis based
on the data that was extracted and augmented.
It then provides a representation that AI architectures,
not specializing materials can use.
OK, so back door materials example now that we understand
what generative models do a little bit,
we're going back to PAX.
So after applying the generative model to fill in the
gaps,
we turn to an expert in the loop technologies.
So what we mean here is we're using the knowledge
of a subject matter expert to select the best candidates
that might be suitable for experimental validation.
Those few candidates move forward to the next step,
synthesizing the compound to validate its sustainability qualities.
And performance synthesis is the most demanding task in material
in making materials,
and requires substantial human effort.
But with the recent innovations cloud based AI driven autonomous
labs called IBM,
RXN for Chemistry and IBM Robotics End,
we can increase reliability, decrease time and achieve scalable autonomous
chemical synthesis.
We use these tools to design the best synthetic routes
and implement them remotely in a robotic lab.
So here's a short video of IBM Rob RXN lab
at work.
It is synthesizing a packs of a class that was
down selected from the generative model output from the AI.
To recap, deep search form the knowledge base of tags.
Aion rich simulation augmented this with important data for sustainability
and performance properties.
A generative model proposed potential candidates with targeted properties.
An expert in the loop technologies help prioritize candidates,
and finally, the autonomous labs design and executed the correct
synthetic procedure to produce the targeted pack.
That is how we achieve last November the first material
using accelerated discovery.
Here you see it's measure mass spectrum.
It brings with it the promise of a world of
many,
many more accelerated discoveries to come.
Robotics 10 is available now over 21,000 people on experts
have used it to predict more than two and a
half million chemical reactions.
458 synthesis have been completed since October of last year
alone and this just shows how hybrid cloud is becoming
a key technology cofactor to revolutionise several fields anywhere you
have an Internet connection. You have a chemical lab in
your hands,
that's how powerful it is.
So accelerate discovery will define the most innovative businesses of
tomorrow already today.
Discovery is an integral part of such innovative future looking
businesses.
We can think about them in three categories with 2.4
trillion dollars of annual expenditure worldwide.
The first one is companies that derive their product based
on the application of discovery,
such as those in the life Sciences,
chemicals and materials. It is scientific discovery as a business.
The second one is their businesses that rely on discovery.
These are sectors like transportation,
healthcare, and utilities that gain competitive advantage based on the
application of science.
And the third one, our information on Discovery driven enterprises
that gained their competitive advantage from using data experimentation and
learning.
Just look at the numbers to see the immense opportunity
we have here.
These are numbers that were taking on November 5th prior
to the vaccine announcements.
That was an event that changed the market.
52 trillion dollars of total revenue.
That's the impact of science and discovery in business.
Industries with Discovery as a business represent a total revenue
of 6.5 trillion dollars with 12 trillion in market cap.
The colors here show the change in market capitalization year
over year.
The greener the color, the better the reader.
The wars this size of the bar represents the revenue.
Businesses that rely on discovery add to a total of
26 trillion,
with a market cap of around 33 trillion.
One information on discovery driven businesses represent a total market
cap of 35 trillion dollars with 20 trillion in revenue.
I am certain that tomorrow's most innovative businesses will be
discovery driven enterprises.
They will seek the platforms,
tools and technologies that will allow them to accelerate the
discovery process that gives them their competitive advantage.
But that requires a serious investment in science and R&D.
It is one that IBM is making and we hope
you will too because we have no time to waste.
How much does it matter to you that the development
and production of the COVID-19 vaccine took months instead of
years?
If we had preceded the traditional way starting from scratch
in 2020 when Covid hit,
we would have a vaccine by 2033.
For some diseases, that time could be never.
Fast tracking steps and starting early based on the compounding
returns of past research would give a scene in little
more than a year.
We got it even sooner so from 14 years in
average to less than 14 months for vaccine.
Think about how much this matters to all of us.
And how much does it matter to you that you
may be able to accelerate or ability to address climate
challenges?
How about more granular predictions to mute future demands?
It is possible it is happening.
To address her biggest challenges,
we need to discover faster.
We need to unleash the power of accelerated discovery and
we need to do it with purpose,
not just digital innovations. For digital products and services,
but let's also direct or digital powers to improve our
physical world.
Digital for physical too. I invite you to learn more
by visiting research.ibm.com and you can download or just published
just today.
2021 Science and Technology Outlook focused on the new era
of accelerated discovery.
Thank you very much.

Topics

About

Topics

CES 2024 Recap

CES 2025

IBM: Accelerating Discovery to Solve Big Challenges

Watch CES 2021 Programming