an article on the breakthrough of ai technology in 2018

Posted by punzalan at 2020-03-11
Anne Xia Yi is from the "no temple" qubit production official account No. QbitAI

2018 is still an exciting year for AI.

This year has become the watershed of NLP research, with continuous breakthroughs; CV field is also brilliant, and compared with four years ago, the fake face generated by Gan is unbelievable; the emergence of new tools and frameworks also makes the future of this field particularly look forward to

Recently, analytics vidhya released a 2018 AI technology summary and 2019 trend forecast report, originally by Pranav dar. On the basis of retaining this report architecture, qubit re edited and supplemented the content.

This report summarizes and combs the major progress in the main AI technology fields throughout the year, and also gives the relevant resource addresses for better use and query.

The report covers five main parts:

Natural language processing (NLP)

computer vision

Tools and libraries

Reinforcement learning

AI morality

Next, let's take stock and look forward one by one. Hey, feed the dog~

There is no doubt that 2018 has a special position in the history of NLP.

According to the report, this year is the watershed of NLP. In 2018, there are continuous breakthroughs in the field of NLP: ulmfit, Elmo, the latest hot Bert

Transfer learning has become an important driving force for the development of NLP. Starting from a pre training model, it constantly adapts to new data, bringing endless potential, and even "the era of Imagenet in NLP field has arrived".


This abbreviation stands for "fine tuning of general language model", from ACL 2018 paper: universal language model fine tuning for text classification.

This paper is the first shot of this year's NLP migration learning carnival.

One is Jeremy Howard, founder of, who is experienced in transfer learning; the other is Sebastian ruder, a doctoral student in natural language processing, whose NLP blog is read by almost all his peers.

Two people's expertise combined, there is ulmfit. If you want to solve an NLP task, you don't need to start training model from 0. Take ulmfit and use a small amount of data to fine tune it. It can achieve better performance on new tasks.

Their method surpasses the most advanced model in six text classification tasks.

Detailed instructions can be read in their papers:

Training scripts and models are released on website:


This name, of course, does not refer to the role in Sesame Street, but "word embedding of language model". It comes from the deep contextualized word representations of Allen Institute of artificial intelligence and University of Washington. NLP will be one of the excellent papers of naacl HLT 2018.

Elmo uses language model to get word embedding, and also takes the context of sentence and paragraph into account.

This kind of contextualized expression of words can reflect the complex features of a word in grammatical and semantic usage, as well as how it changes in different contexts.

Of course, Elmo also showed a powerful effect in the experiment. Applying Elmo to the existing NLP model can improve the performance of various tasks. For example, on the machine Q & a data set squad, using Elmo can improve the previous best model score by 4.7%.

Here are more information and resources about Elmo:


It's not too much to say that Bert is the hottest NLP model in 2018. It's even called the beginning of a new era of NLP.

It was launched by Google. Its full name is bidirectional encoder representations from transformers, which means bidirectional encoder representation from transformer and also a pre training language representation method.

In terms of performance, no model can compete with Bert. It has achieved the top results in 11 NLP tasks. Up to now, only one of the top 10 squad 2.0 is not a Bert variant:

If you haven't read Bert's paper, you should really complete this lesson by the end of 2018:

In addition, Google officially opened training code and pre training model:

If you're a pytorch party, you're not afraid. Here are the officially recommended pytorch re implementation and transformation scripts:


After Bert, what surprises can NLP circle reap in 2018? The answer is, a new tool.

Just last weekend, Facebook opened the NLP modeling framework pytext that its engineers have been using. This framework, which processes more than 1 billion NLP tasks for various Facebook applications every day, is an industrial toolkit.

(Facebook open source new NLP framework: simplify deployment process, large-scale application is OK)

Based on pytorch, pytext can accelerate the progress from research to application. It only takes a few days from model research to full implementation. The framework also contains some pre training models, which can be used to deal with text classification, sequence annotation and other tasks directly.

Want to try? The open source address is here:


If the previous studies are too abstract for you, duplex is the most vivid example of the progress of NLP.

A strange name? But you must have heard of this product, which is the "call AI" displayed by Google at the 2018 I / O Developers Conference.

It can take the initiative to call the hairdresser and restaurant to make an appointment. It can communicate smoothly throughout the whole process. It's just a fake. John Hennessy, Google's chairman, later called it "an extraordinary breakthrough," adding: "in the area of appointments, this AI has passed the Turing test."

The comprehension ability of duplex in multi round conversation and the natural degree of synthetic voice are all the embodiment of NLP's current level.

If you haven't seen the video yet

■ outlook for 2019

What will NLP do in 2019? Let's borrow the prospect of Sebastian ruder, author of ulmfit:

Pre training language model embedding will be everywhere: without pre training model, it will be very rare to train from scratch to the top level model.

The pre training representation that can encode professional information will appear, which is a supplement of language model embedding. At that time, we can combine different types of pre training representations according to task needs.

There will be more research on multi language application and cross language model. Especially on the basis of cross language words embedding, deep pre training cross language representation will appear.

This year, a large number of new researches have been published in both image and video fields, and three major researches have set off a collective wave in CV circle.


In September, when the ICLR 2019 paper in the double-blind review with biggan appeared, the experts were furious: they could hardly see that it was generated by Gan itself.

In the history of computer image research, the effect of biggan is much better than that of predecessors. For example, after 128 × 128 resolution training on Imagenet, its perception score (is) score is 166.3, which is 3 times of the previous best score of 52.52.

In addition to dealing with 128 × 128 small graphs, biggan can also directly train on 256 × 256 and 512 × 512 Imagenet data to generate more convincing samples.

In the paper, the researchers revealed that behind the amazing effect of biggan, there was a real cost. The maximum cost of training was 512 TPUs, which could reach 110000 US dollars, or 760000 yuan.

Not only the model parameters are many, but also the training scale is the largest since Gan. Its parameters are 2-4 times that of predecessors, and the batch size is 8 times that of predecessors.

Related address

Research paper: HTTPS: / / openreview. Net / PDF? Id = b1xsqj09fm

Extended reading

Surprise! The best Gan in history, super real AI fake photos, experts are boiling

The best Gan in the history of training uses 512 TPUs, which is a self statement: This is not algorithm progress, it is computational power progress

The strongest Gan in history: 100000 training fees, now free experience, and realistic painting style

■ train the whole Imagenet in 18 minutes

How long does it take to train a model on a full Imagenet? The major companies keep breaking records.

However, there are also civilian versions that burn less computing resources.

In August this year, Jeremy Howard, the founder of online deep learning course, and his students trained the image classification model to 93% accuracy rate on Imagenet in 18 minutes using Amazon AWS's cloud computing resources rented.

Before and after, the team only used 16 AWS cloud instances. Each instance carries 8 NVIDIA V100 GPUs. The result is 40% faster than that of Google's TPU pod in the dawnbench test at Stanford.

The cost of such a top-ranking achievement is only $40, which blog calls "everyone can achieve".


Introduction to blog:

Extended reading

$40 18 minutes to train the entire Imagenet! Everyone can achieve

224 seconds! The best result of resnet-50 training on Imagenet came out, and Sony broke the record

■ vid2vid Technology

In August this year, NVIDIA and MIT's research team outperformed a super realistic HD video generation AI.

As like as two peas, a dynamic semantic map can be obtained almost the same as the real world. In other words, as long as you sketch out the scene in your heart, the movie level video can automatically P out without real shooting:

In addition to street views, faces can also be generated:

The vid2vid technology behind this is a new method under the framework of generative adversary learning: well-designed generator and discriminator architecture, coupled with spatiotemporal adversary target.

This method can achieve high-resolution, realistic and time-dependent video effects on segmentation mask, sketch, human posture and other input formats.

Good news, vid2vid is now open source to NVIDIA.

Related address

Research paper:

GitHub address

Extended reading

Real to terrible! Yingweida MIT creates Ma Liang's magic pen

Study in depth for half a year

■ 2019 Trend Outlook

Analytics vidhya expects that next year in the field of computer vision, more research will be done to improve and enhance existing methods than to create new ones.

In the United States, the government's restrictions on drones may be slightly "loosened" and the degree of openness may increase. And this year's fire self-monitoring learning may be applied to more research next year.

Analytics vidhya also has some expectations in the field of vision. At present, the latest research results published at the international summit such as CVPR and ICML are not optimistic about their application in the industry. He hopes to see more research in the actual scenario in 2019.

Analytics vidhya expects that visual question answering (VQA) technology and visual dialogue system may debut in various practical applications.

Which tool is the best? Which framework represents the future? This is a topic that can be debated forever.

There is no objection that no matter what the outcome of the debate is, we need to master and understand the latest tools, otherwise it may be abandoned by the industry.

This year, the tools and frameworks in machine learning are still developing rapidly. The following is the summary and Prospect of this field.

■ PyTorch 1.0

According to the 2018 annual report released by GitHub in October, pytorch ranked second in the fastest-growing open source projects. It is also the only in-depth learning framework.

As the biggest "enemy" of tensorflow, pytorch is actually a new recruit, which was officially released on January 19, 2017. In May 2018, pytorch and caffe2 were integrated to become a new generation of pytorch 1.0, with further competitiveness.

In contrast, pytorch is fast and very flexible. More and more open codes on GitHub use pytorch framework. It can be predicted that pytorch will become more popular next year.

How to choose pytorch and tensorflow? In a report we sent before, many big guys stood at pytorch.

In fact, the two frameworks are more and more similar. Denny britz, a former Google brain deep learning researcher, believes that in most cases, which deep learning framework to choose has less impact.

Related address

Pytorch official website:

Extended reading

Pytorch or tensorflow? Here's a guide for beginners

Try pytorch 1.0 essential partner

The tensorflow throne is not guaranteed? ICLR contribution paper pytorch's appearance rate is about to exceed


Many people call automl a new way of deep learning and think it has changed the whole system. With automl, we no longer need to design complex deep learning networks.

On January 17 this year, Google launched cloud automl service to release its automl technology through cloud platform. Even if you don't understand machine learning, you can also train a customized machine learning model.

But automl is not Google's patent. In the past few years, many companies have been involved in this field, such as rapidminer, knit, datarobot, and so on.

In addition to the products of these companies, there is an open source library to introduce:

Auto Keras!

This is an open source library for performing automl tasks. It aims to let more people, even without the expert background of artificial intelligence, do machine learning.

The author of this library is Hu Xia, assistant professor of Texas A & M University, and his two doctoral students: Jin Haifeng and Qingquan song. Auto keras directly attacks three defects of Google automl:

First, you have to pay.

Second, because on the cloud, you have to configure the docker container and kubernetes.

Third, Google can't guarantee your data security and privacy.

Related address

Official website:


Extended reading

Understand the new king of deep learning "automl"

Open source "Google automl killer" is coming

Google enlarge! Full automatic training AI without code, relying on the newly released cloud automl

■ TensorFlow.js

Tensorflow.js was officially released at the tensorflow developers' summit 2018 at the end of March this year.

This is a machine learning framework for JavaScript developers, which can completely define and train models in browsers, import tensorflow and keras models for offline training for prediction, and realize seamless support for webgl.

Using tensorflow.js in the browser can expand more application scenarios, including interactive machine learning, all data saved in the client, etc.

In fact, the newly released tensorflow.js is just based on the previous deep learn.js, which is just integrated into tensorflow.

Google also provided several application cases of tensorflow.js. For example, borrow your camera to play the classic game pac man.

Related address

Official website:

Extended reading

Have notebook can play the sense of body game! Tensorflow.js implementation of body sense combat tutorial

Google AI magic mirror: watch you dance, call out 80000 photos to learn from you

I'm not a pervert, I'm just looking for the original expression bag

■ 2019 Trend Outlook

In the topic of tools, automl is the most popular topic. Because it's a core technology that really changes the rules of the game. Here, I quote the outlook of Marios michailidis (kazanova), the God of, on the field of automl next year.

Help to describe and understand data by intelligent visualization, providing insight, etc

Discover, build, extract better features for datasets

Quickly build a stronger and more intelligent prediction model

Through machine learning interpretability, make up the gap brought by black box modeling

Driving these models

There is a long way to go to strengthen learning.

In addition to occasional headlines, there is a lack of real breakthroughs in the field of reinforcement learning. The research of reinforcement learning is very dependent on mathematics, and has not formed a real industry application.

I hope to see more actual use cases of RL next year. Now I will pay special attention to the progress of reinforcement learning every month in order to see what may happen in the future.

■ openai's introduction to intensive learning

Human beings, who have no machine learning basis at all, can also quickly start to strengthen learning.

In early November, openai released an introductory tutorial on reinforcement learning (RL): spinning up. From a set of important concepts, to a series of key algorithm implementation codes, to warm-up exercises, each step is clear and concise, standing in the perspective of beginners.

According to the team, there is no set of general reinforcement learning materials, and only a small number of people can enter the RL field. This situation needs to be changed, because reinforcement learning is really useful.

Related address

Tutorial portal:

GitHub portal:

Extended reading

How to get started with reinforcement learning? It's enough to read this article

Everyone can start: openai launches a friendly intensive learning course for beginners | code is simple and easy to understand

Introduction to Q-learning: teach computer to play "catch cheese" games

■ Google's new framework of reinforcement learning "dopamine"

Dopamine (dopamine), an open source framework for reinforcement learning released by Google in August this year, is based on tensorflow.

The new framework is designed with a clear and concise concept, so the code is relatively compact, about 15 Python files. Based on the arc learning environment (ALE) benchmark, it integrates dqn, C51, rainbow agent Lite edition and implicit quantity networks on ICML 2018.

In order to enable researchers to quickly compare their ideas and existing methods, the framework provides complete training data of 60 yadali games under the ale benchmark of dqn, C51, rainbow agent lite and implicit quantitative networks.

In addition, there is a group of teaching collab of dopamine.

Related address

Dopamine Google Blog:



Game training visualization page:

■ 2019 Trend Outlook

Xander steenbrugge, spokesperson of datahack summit 2018 and founder of arxivinsights, is also an expert in reinforcement learning. Here is his summary and outlook.

1. As more and more auxiliary learning tasks increase sparse external rewards, the complexity of samples will continue to improve. In a very sparse reward environment, the effect is very good.

2. Because of this, it will be more and more feasible to train directly in the physical world, instead of the current methods that mostly train in the virtual environment first. I predict that in 2019, there will be the first robot demo that is only trained by deep learning, has no human participation and performs well.

3. After deepmind extends the story of alphago to the field of Biology (alphafold), I believe that reinforcement learning will gradually create practical business value outside the academic field. For example, new drug discovery, electronic chip architecture optimization, vehicles, etc

4. There will be an obvious change in reinforcement learning, and the behavior of agents tested on training data will no longer be considered as "allowed". Generalization indicators will be at the core, just like supervised learning.

AI abuse was frequently reported in 2018: Facebook AI helped trump to be president of the United States, Google and the U.S. military jointly developed AI weapons, Microsoft provided cloud computing and face recognition services for the immigration and Customs Enforcement Agency (ice)

Every accident will set off a new wave of discussion on the AI ethics, and some Silicon Valley technology companies have also formulated the enterprise AI guidelines again.

According to analytics vidhya, AI ethics is still a gray area, and there is no framework for everyone to follow. In 2019, more enterprises and governments will formulate relevant regulations.

The development of AI ethics is just in its infancy.

Extended reading

Google will ban AI from being used as a weapon, and Li Feifei and other "anti cooperation with the military" incidents

Just now, Google released seven AI principles: no weapons development, but will continue to work with the military

AI helps trump win? Biggest data abuse exposure in FB history, leading to ACL lifetime award winner

The author is the signing author of "different attitudes" of Netease News and Netease account

- Finish -

Registration of activities

Join the community

The qubit AI community has started recruiting. Welcome the students who are interested in AI, and reply the keywords "communication group" in the official account QbitAI dialogue interface to get the way of entering the group.

In addition, the sub group of qubit majors (autonomous driving, CV, NLP, machine learning, etc.) is being recruited for engineers and researchers in relevant fields.

Enter the professional group, please reply to the keyword "professional group" in the QbitAI dialogue interface to get the official account. (professional group audit is strict, please understand)

Sincere recruitment

Qubit is recruiting editors / journalists from Zhongguancun, Beijing. Looking forward to talented and enthusiastic students to join us! For details, please reply to the two words in the official account QbitAI dialogue interface.

Signed by qubit qbitai · headline

վ'ᴗ'ի tracking AI technology and new product trends