IMCAFS

Home

summary and practice of knowledge map construction technology

Posted by tetley at 2020-03-23
all

Preface

Knowledge map is a special semantic network. It uses the basic units of entity, relationship and attribute to describe the relationship between different concepts and concepts in the physical world in the form of symbols. Why knowledge map is very important for information retrieval, recommendation system and question answering system? Let's use an example to illustrate:

Suppose in a search scenario, we type in the search box

It can be seen that this query is a complete question. If there is a large question answering corpus (such as FAQ scenario) or a large enough article database (the title coverage of the article is high) in the retrieval system, using semantic matching technology to calculate the similarity between query and FAQ question and article title may be a more ideal solution (refer to the previous article [citation Use]. However, the reality is always more skeletal, and it is very difficult for the content to be retrieved to be complete in language expression. So for this case, what is the search baseline like?

In the traditional search process, we will first segment query, that is, the original sentence becomes

For the content of articles in the database, the inverted index is used to index in advance, and then the words in the segmentation results are recalled and sorted according to BM25 algorithm.

Of course, we can build a business dictionary according to the needs of the business scenario, and prepare some business keywords, such as "sitting on the moon" in the example is a compound noun; in addition, we can discard some stop words, such as "can" and "can" when indexing. Then adjust the weight according to the part of speech (such as disease words are better than other words, etc.) to expect the relevance of the final article ranking to improve.

But so far, we can see that the search center is based on keywords, that is to say, the words "sitting on the moon" and "bathing" must appear in the content of the article. At the same time, because of the elimination of stop words and verbs, query will also lose its semantic meaning, "sitting on the moon" and "bathing" become two separate concepts.

So, as a natural human, we understand this query as follows:

The whole sentence is understood as "whether the puerperal women can do daily activities such as bathing", and we can also naturally reason. Perhaps the more appropriate feedback of this query is such articles as "daily taboos in puerperium".

It can be found that when human beings think about the representation of natural language, they are based on several concepts as fulcrum, and have clear distinction between the upper and lower levels of concepts. These concepts come from the knowledge system that human beings have learned for many years. We can reason, analyze and think on the existing knowledge, which is not possessed by computers. Traditional NLP technology based on corpus hopes to map the whole real world from a large amount of text data in reverse. This method is easy to realize in engineering and can get good results in some small scenes, but it will expose various problems as mentioned above. The mapped knowledge table is severely limited by the scene and difficult to migrate; it lacks the organization of knowledge system; the effect cannot be explained Etc.

In order to solve these problems, the author thinks that knowledge map may be the key to break through. Compared with long text, knowledge points (entities, relationships) are easier to organize, expand and update. Natural language is the abstract of human cognitive world. To understand natural language by computer, we need a brain full of knowledge.

Dingxiangyuan is also trying to build its own knowledge map of medical brain. In the future, it will continue to share technical research notes related to knowledge map, such as knowledge extraction, knowledge representation, knowledge fusion, knowledge reasoning, etc., as well as landing practice in search recommendation and other scenarios. This paper is mainly for our technical research notes sharing in the construction of knowledge map.

concept

We have known in the overview of each knowledge map that knowledge map is essentially a semantic network, where nodes represent entities or concepts, and edges represent various semantic relationships between entities or concepts. On the level of knowledge system, knowledge map has three specific organization classifications, including ontology, taxonomy and Folksonomy.

These three classifications can be simply understood as the distinction of knowledge map to three different strictness levels of hierarchical relationship. Ontology is a tree structure, which has the strictest isa relationship between nodes of different layers (for example, human activities - > Sports - > Football), the advantage of this kind of map is that it is convenient for knowledge reasoning, but it can't represent the diversity of conceptual relations; taxonomy is also a tree structure, but the level is less strict, and the nodes are constructed with hypernym hyponym relations, so the conceptual relations of this advantage are rich, but it is easy to cause ambiguity, and it is difficult to effectively reason; Folksonomy is non hierarchical In addition to flexibility, the accuracy of semantics and reasoning ability are all lost.

At present, the organizational structure of taxonomy is a more popular type in the Internet industry, because it takes into account the relationship between the upper and lower layers and the label system to a certain extent, and has the best flexibility in various applications. This paper focuses on the construction technology of taxonomy.

structure

The data source of building large-scale knowledge base can come from some open semi-structured, unstructured and third-party structured databases. It is the simplest to obtain data from structured database, and most of the work needed to be done is to sort out unified concepts and aligned entities; secondly, it is to obtain knowledge from semi-structured data, such as Wikipedia:

The description of terms in Encyclopedia is a good data source. It provides rich context of entity words, and each term has a detailed catalog and paragraph distinction:

We can extract the information of the corresponding structured fields from different paragraphs. Of course, some words may be hidden in the text, so we need to use some NLP algorithm to identify and extract, such as named entity recognition, relationship extraction, attribute extraction, reference resolution and so on. Related technologies and model optimization will be carried out in detail in the following articles, which will not be covered here.

Then, as mentioned earlier, the knowledge map of taxonomy type is an ISA tree structure. From structured or semi-structured data sources, specific types of relationships can be easily obtained, but the hierarchical relationship data of the upper and lower levels is relatively small. If we want to expand this kind of data, or expand the relationship types that are not in the original knowledge map, we need to extract them from the unstructured text or semi-structured data source.

For example, "gout is a purine metabolic disorder", "proteoglycan is a extracellular matrix", "collagen is a extracellular matrix", "chondroitin sulfate is a extracellular matrix" can be extracted from the above text.

With the development of knowledge mapping, the research in this area has increased gradually in recent years, including the construction methods in the field of Chinese. Next, we do a method research on the construction technology of taxonomy, and introduce several specific methods in the field of Chinese and English.

Construction technology of taxonomy

Although I have imagined many benefits of building a perfect taxonomy for subsequent NLP applications, I still need to be calm to know that the current research in this field is not perfect, and the main difficulties come from the following three reasons: first, there are huge differences in the length, theme and quality of text data, and the designed extraction templates (such as some regular expressions) are not compatible with each other In the language scene of different fields; secondly, because of the diversity of language expression, the integrity of extracted data will encounter difficulties, which greatly affects the final accuracy; thirdly, it is also the difference in the field, and the elimination of the extracted knowledge is also a headache.

Next, we will introduce some existing research results of academia and industry. These methods use algorithms to improve the accuracy from different perspectives. At the same time, they also include several subtasks under the taxonomy construction task, including hyponymy acquisition, hypernym prediction, and taxonomy induction.

In the current process of building each taxonomy based on free text-based, we can summarize the following two main steps: I) extract is-a relationship pair from the text by using pattern based or distribution; ii) induct the extracted relationship pair data into a complete taxonomy structure.

Pattern based method

Template method, as the name implies, is to design some fixed templates to match the original text, the most intuitive is to write regular expressions to grab. In this field, Hearst first designed several simple templates, such as "[C] such as [e]," [C] and [e] ", so that we can get the upper and lower relative word pairs from the sentences that conform to these logical sentence patterns. These templates may seem simple, but there are many successful applications based on them, such as the most famous Microsoft's probase data set.

Because of its simplicity, the template method obviously has many disadvantages. The biggest problem is the low recall rate. The reason is also very simple. Natural language has a variety of rich expressions, and the number of templates is limited, so it is difficult to cover all sentence structures. Secondly, the flexibility of language will also affect the accuracy of extraction. Common errors include unknown common words, wrong expression, extraction completion, ambiguity and so on. There have been a lot of research results in the field of how to improve the recall rate and accuracy rate based on template.

a) . how to improve recall?

First, how to improve the recall rate. The first method is to extend the template, such as designing the corresponding template for different types of entity words, making flexible replacement of articles and auxiliary words inside the template;

There are also some work of automatically expanding templates, such as learning syntactic patterns for automatic hypernym discovery by Rio snow group, which uses syntactic dependency path to automatically obtain new templates

However, this automatic template generation method will also bring new problems. If our original corpus is very large, the features of a single template are very sparse. Another idea is to pay attention to the features of patterns, improve their generation, and thus improve the recall rate. The related work includes the concept of star pattern proposed by Navigli group: replacing low-frequency entity words in sentences, and then selecting more general patterns through clustering algorithm.

Syntactic dependency as a feature has a similar idea. In Patty system, POS tag, logical type or entity word will be randomly replaced in dependency path, and finally pattern will be selected.

The second method is iterative extraction. The main assumption is that some wrong relational data will be extracted many times by some too general partners because of language ambiguity or semantic drift. So if we design a verification mechanism, maybe we can get rid of them. For example, in "semantic class learning from the web with typonym pattern linkage graphs", a "double anchored" pattern is designed, which is extracted with a bootstrapping loop.

The third method is hypernym inference. Another factor that affects the recall rate is that the goal of template is to complete a sentence, which means that the upper and lower words of a relationship must appear in the sentence at the same time. A very natural idea is whether we can use the transitivity of words, for example, y is the upper word of X, and X is very similar to X ', then this upper and lower relationship can be used as the transitivity. In the existing work, there is a train HMM to make cross sentence prediction. In addition to nouns, there are also some studies on syntactic inference based on the modifiers of hyponyms, such as "grizzly bear" is also a kind of "bear"

The above figure is the work of "reviewing taxonomy induction over Wikipedia". It makes a set of heuristic extraction process by taking the head words in phrases as features.

b) . how to improve accuracy?

How to evaluate the accuracy of atlas construction? The most common are some statistical based approaches. For example, (x, y) It is a pair of candidate is-a relationship pairs. In the knowitall system, the point mutual information (PMI) of X and Y is calculated with the help of search engine; in the probase, the likelihood probability is used to express that y is the upper probability, and the maximum probability is taken as the result; other methods include the prediction result of Bayesian classifier, external data verification, expert evaluation verification, etc.

For how to improve the accuracy outside the extraction process, most research methods are to select a validation index, and then build a classifier to iterative optimization. However, the accuracy of simply using templates is generally low. Most of the work of introducing classifiers is mentioned in the mixed scheme of template + distributed. Next, we will introduce the idea of distributed extraction.

Distributional method

In the field of NLP, distributed method is the result of expression learning including word vector, sentence vector and so on. One of the advantages of distributed representation is to transform the original discrete data in NLP domain into continuous and computable data. This idea can also be introduced into map construction, because computability means that there are some relationships between word vectors, which can also be the upper and lower relationships of is-a data pairs. Another advantage of distributed method extraction is that we can predict is-a relationship directly, rather than through extraction. The main steps of this method can be summarized as follows: I) get the key term; ii) use unsupervised or supervised model to get more candidate is-a relationship pairs.

a) . key terms extraction

There are many ways to get the seed data set, the most intuitive is to design strict patterns. The advantage of doing this is to ensure that the key term has a high accuracy, which works well in the case of a large number of corpus. However, when the corpus data is small, there may be insufficient number of extraction, resulting in the subsequent model training over fitting. In addition to pattern extraction, some researches use sequence annotation model or ner tool to pre extract, and then use several rules to filter.

In some researches based on the vertical domain, domain filtering will be attached. Most are based on some statistical values to make some thresholds, such as TF, TF-IDF or other domain related scores. There are also studies that will give weight to sentences when selecting sentences, and extract key terms from sentences with high domain weight.

After the key term is obtained, how to expand new relationship pairs based on these seed data is the next step.

b) . unsupervised model

The first direction is clustering scheme. For clustering, the core of research is which distance evaluation index to use. Simple indexes, including cosine, Jaccard, Jensen Shannon diversity, can be used as an attempt. There are also slightly more complex ones, such as comparing (x, y) with other characteristics or weights, such as Lin measure:

Where FX and FY represent the extracted feature, and W represent the weight of the feature.

In addition, some researchers have noticed that, for example, in Wikipedia's entry page, the hyponym only appears in some context describing the superordinate. However, the upper words may appear in the whole context of the lower words. Due to such asymmetry, the distance evaluation has also been adjusted accordingly, such as using weedprec:

This kind of hypothesis is called distribution inclusion hypothesis (DIH). Similar distance evaluations include weedrec, BALAP Inc, clarkede, cosweeds, invcl, etc.

In addition to distance evaluation indicators, another concern is how to select features, such as co-occurrence frequency, peer-to-peer information, LMI, etc.

c) . supervised model

After having key term and clustering operations, the way to further improve the accuracy is to build a supervised model, which can be classified or ranking.

From the point of view of classifier, the most popular scheme is that we train a language model in advance, such as word2vec. Candidate data pairs (x, y) are mapped to corresponding vectors, the two vectors are spliced, and then SVM is used to make a binary classifier. This method is compared as baseline in many subsequent studies. This method is simple and effective, but it also has some problems in recent years. It is found in practice that what this classifier learns is the semantic connection, rather than the upper and lower relationship that we expect, in other words, it is very easy to over fit. The alternative is to diff (vector x) and (vector y), or combine the methods of adding and point multiplication to get the comprehensive feature.

Subsequent researchers believe that the training of word vectors is greatly influenced by the corpus environment, so it is difficult to map the hyponymic relationship to the embedding of words. So on the basis of word vector, we build a layer of embedding for X and y to express the relationship. The experimental results show that this method has a good index improvement in the construction of atlas in specific fields.

In addition to the classifier, the hypernym generation method is also a choice, and it is also the best method at present. It is roughly to build a piecewise linear project model to select the closest (vector y) to the (vector x) upper word, and ranking technique is also used here. We have selected the relevant work in the field of Chinese to make an attempt. We can continue to see the following chapters for details.

Taxonomy Induction

In the previous chapter, we introduced various technologies to extract is-a relationship pairs from text. The last step is how to combine these relationships with data to form a complete map. Most methods are incremental learning mode, which initializes a seed taxonomy, and then adds new is-a data to the graph. The research in this direction is based on which evaluation index is used to insert new data.

The common method is to regard the construction as a clustering problem, and the similar subtrees are merged by clustering. For example, unsupervised learning of an is-a taxonomy from a limited domain specific corpus uses k-medoids clustering to find the minimum stage of common ancestor.

Graph related algorithms can also be used as a direction, because taxonomy is naturally a graph structure. For example, in a semi supervised method to learn and construct taxonomies using the Web provides a way to find all the nodes with an in degree of 0, most likely the top of the taxonomy, all the nodes with an out degree of 0, most likely the instance at the bottom, and then find the longest path from root to instance in the figure, and then you can get a more reasonable structure.

Others attach weight values related to various fields to the edge of a graph, and then use algorithms such as dynamic programming to find the optimal segmentation, such as optimal branching algorithm.

The last step of the construction is to clean the taxonomy and remove the wrong is-a relationship from the data. The first key feature is that there is no ring structure in the parent-child relationship of taxonomy. The probase database cleans up about 74k wrong is-a relationship pairs by removing the ring structure during construction.

Another big problem is the ambiguity of entity words, which has no effective solution at present. Especially in some automated map building systems, introducing the "transitivity" mentioned above to expand data often brings greater risk of dirty data. For example, there are two is-a relationships:

But we can 't get it by transitivity

Although there are also some efforts to learn multiple senses of an entity word, multiple choices do not mean knowing which is the right choice. In most cases, the disambiguation of entity words needs more circumstantial evidence, which means that it needs you to have rich knowledge and background data first. What we are doing is to build a map, which has become the problem of "laying eggs or laying hens". From the academic level, building a fully disambiguated taxonomy has a long way to go. Fortunately, we can have many other tricks in the application level, including collecting users' search, clicking logs, parsing UCG content, obtaining information from them to help us eliminate discrimination, and feeding back to the knowledge map.

Above we have made a brief overview of the construction technology of taxonomy. Now we can see how the complete process of building the atlas in the field of Chinese and English is.

Construction of probase

From the beginning of Microsoft's probase, the construction of the atlas emphasizes the concept of probibilistic taxonomy. The existence of knowledge is not "black or white", but in a certain probability form. Retaining the uncertainty at this level can reduce the impact of noise in the data and help the subsequent knowledge calculation. In the data of probase, each pair of hypernym response corresponds to the confidence in the form of preserving the co occurrence frequency in the constructed corpus, as follows:

For example, company and Google have been associated 7816 times in the map construction, which can be used in subsequent applications to calculate the credibility.

Then, the specific construction process of probase can be understood as two steps. First, we use Hearst patterns (below) to obtain the candidate pairs of upper and lower noun phrases in the original corpus.

After obtaining candidate pairs, they are combined into tree data structure according to parent-child relationship. The whole process is relatively simple, as follows:

Mining candidate entity pairs (such as (company, IBM), (company, Nokia), etc.) from the original sentence to form a small subtree;

Then, the sub trees are merged into a complete map by the horizontal or vertical merging principle, and the paper gives a complete construction process

It can be seen that in the process of merging subtrees, it is not simply to do direct merging, but to decide through a similarity calculation sim (child (T1), child (T2)). This similarity calculation is also relatively simple, just use Jaccard similarity,

Jaccard similarity

Suppose there are three subtrees:

A = {Microsoft, IBM, HP} B = {Microsoft, IBM, Intel} C = {Microsoft, IBM, HP, EMC, Intel, Google, Apple}

The calculation results show that J (a, b) = 2 / 4 = 0.5, J (a, c) = 3 / 7 = 0.43. Here, a threshold value will be set during construction. If it is 0.5, then a and B can be combined horizontally, but a and C can't.

Problems in the construction of Chinese Atlas

In a word, the probase method is relatively simple, which provides a large framework level idea in construction, but there are still many problems in details. This problem is discussed in detail in the paper learning semantic hierarchies via word embeddings, especially in Chinese. The first problem is that Chinese grammar is more flexible than English. Although the accuracy of using Chinese first style logical patterns is high, the recall rate is very low compared with other methods, and the F1 value is also poor. Because it is difficult and inefficient to finish all sentence structures manually, we can see the comparison of several methods in the following article:

On the other hand, the ability of using patterns only in semantic hierarchy construction is insufficient

Simple building methods like probase are prone to missing or wrong relationships.

A hybrid framework of mapping model and template

At present, in the mainstream construction schemes, in addition to the use of Hearst pattern, the commonly used means are to use distributed representation means, such as calculating PMI, standard IR average precision of upper and lower words, or calculating offset of word vectors of both. Continuing the work of Harbin University of technology team, Hua Normal University team has completed a set of taxonomy construction method combining word vector, mapping model and Hearst pattern. This section mainly discusses the work introduced in predicting hypernym relationship for Chinese taxonomy learning.

Hearst Pattern PMI Standard IR Average Precision 词向量

Problem definition

First of all, according to the knowledge background and data in the vertical domain, we have built a basic framework of knowledge map, called taxonomy, which is recorded as t = (V, R), where V represents entity vector and R represents relationship.

Then, some is-a relation data are sampled from the atlas T and recorded as R. Then, we take out the relation set of the transitive closure, which is called R *, and we can understand that the relation data in this set is the data with high reliability.

传递闭包

Taking Baidu Encyclopedia as an example, the entity set obtained from the external data source is recorded as e, each element in the set is, and the parent of the element is recorded as cat (x). Then the whole crawled data set can be expressed as:

For the whole problem, it can be defined as: learning an algorithm f according to Rx, which can filter the unlabeled data in U and integrate it into t.

Basic framework

In the article "learning semantic hierarchies via word embeddings", it has been introduced that the characteristics similar to V (King) - V (Queen) = V (man) - V (woman) in the word vector can help predict the hypernym response relationships of entities during the construction process (for example, diabetes mellitus and disorders of glucose metabolism, gastric ulcer and gastrointestinal diseases). This paper continues the idea of using word vector and training mapping model in word vector space. The overall framework is as follows:

词向量 hypernym-hyponym relations

The process can be expressed as follows: first, extract some initialization data from the existing taxonomy, map them to the word vector space, and then train the linear mapping model in this space, so as to obtain the prior representation of this space. Then, new relationship pairs are obtained from the data source, and a new batch of relationship pair data is extracted to update the training set through model prediction and other rule filtering. The new batch of data can be re calibrated to the project model, so as to cycle.

线性映射模型(piecewise linear projection model)

Model definition

According to the previous description, the first step is to use a large corpus to train a reliable word vector. The author uses the skip gram model to obtain the word vector on a billion words corpus. The method will not be repeated here.

After the word vector is obtained, for a given word x, the conditional probability of finding the word u can be expressed as:

Here, V (x) represents the operation of word picking vector, and V represents the dictionary obtained from the whole corpus

The second step is to build a mapping model. Such a model is also very simple. For a relationship pair of data (Xi, Yi), the model assumes that it can be transformed by a transformation matrix M and a bias vector:

In addition, the author also draws on the experience of predecessors, and finds that a single mapping model can not learn the mapping relationship of this space very well in the experiment. For example, under the open domain data set, the spatial representation of natural biological domain knowledge and financial and economic domain knowledge may be too different to be covered by a single model. What should I do? Let's deal with several more models separately, but instead of introducing a priori knowledge, the author uses K-means to find out the categories directly:

K-means

We can see that after clustering algorithm, animals and occupations are divided into different clusters, and then we can build a mapping model for each cluster.

The optimization objective of the whole model is also well understood. After the vector of entity x is transformed, it should be as close to entity y vector as possible. The objective function is as follows:

Among them, K represents the k-th cluster after clustering, CK represents the relational data set under each cluster, and the optimization method uses stochastic gradient descent

随机梯度下降(Stochastic Gradient Descent)

training method

The system is trained in a circular way. The core idea is to dynamically expand the training set R (T) (t = 1, 2,..., t) through clustering and mapping model. After continuously retraining the model, the generalization ability of the target data source is gradually enhanced.

First, some term tags are agreed in the initialization section:

The following is the circulation process:

Step 1.

Step 2.

After model prediction, template filtering is required. For example, the available Chinese templates are:

After screening, a set of relationships with high reliability is finally obtained

In particular, f here is not simply through the template. The above three templates "is-a", "catch as" and "co synonym" are analyzed respectively:

According to the above analysis, an algorithm is designed to determine how to filter the relational data in U (T) - into U (T) +. For the relational data in the candidate data set u (T), two quantities, positive and negative, are defined according to the credibility. The specific definition of positive is as follows:

Among them, the range of a is (0,1), which is an adjustment coefficient, and gamma is a smoothing coefficient. The empirical value a = 0.5 and gamma = 1 are set in this paper.

Negative is defined as:

If the NS (T) score is high, it means that Xi and Yi are more likely to be "co hybrids", but "is-a" relationship is less likely.

Then, the algorithm involved here is to maximize and minimize at the same time. The formal representation is as follows:

Here, M is the size of U (T) +, theta is a constraint threshold. We find that this problem is a special case of the budget maximum coverage problem, and it is a NP hard problem. We need to introduce greedy algorithm to solve it

预算最大覆盖问题(budgeted maximum coverage proplem) NP-hard 贪心算法

Finally, in addition to updating U (T) +, the other is to merge U (T) + with the original training data:

Step 3.

Sep 4.

model prediction

summary

The construction method of knowledge map in this paper is few in the field of Chinese. Compared with English, there are great differences in the design of pattern and data source. The second half of the paper also discusses the problems found in this process. Firstly, clustering method is used in the system. In the process of parameter adjustment, the effect is not sensitive to the number of clustering centers K. in the case of small K, the effect is not much different, almost in the case of K = 10, the effect is the best. However, if the setting is too large, the final effect will be very poor. But what the author does is open domain knowledge map, and we do vertical domain (medical) knowledge map. In practice, we try to divide the sub data set according to the prior knowledge, so as to train the mapping model respectively.

Secondly, in specific cases, we find some problems of upper and lower errors, such as "herbal medicine" is recognized as the father of "traditional Chinese medicine". Although most of the traditional Chinese medicine is indeed composed of herbal medicine, it is unreasonable from the perspective of classification. This kind of situation may be due to the problem of Chinese expression in the data source, and it is not easy to handle without external knowledge assistance. This part will do some directional pattern design according to the specific data source in our practical practice

Artificial pre observation is helpful to improve recall rate and accuracy rate of information extraction.

In addition, it is found that some word vectors are not well trained. For example, the word vector representation of "plant" and "monocotyledona" in the experiment of this paper is very similar. It may be that some words are too low frequency in the corpus, but this is a hard injury. We need to find ways to improve the pre training effect in the field of Chinese.

Reference

1.《Web Scale Taxonomy Cleansing》

2.《Probase- a probabilistic taxonomy for text understanding》

3.《An Inference Approach to Basic Level of Categorization》

4.《Improving Hypernymy Detection with an Integrated Path-based and Distributional Method》

5.《A Short Survey on Taxonomy Learning from Text Corpora- Issues, Resources and Recent Advances》

6.《Learning Semantic Hierarchies via Word Embeddings》

7.《Chinese Hypernym-Hyponym Extraction from User Generated Categories》

8.《Learning Fine-grained Relations from Chinese User Generated Categories》

9.《Predicting hypernym–hyponym relations for Chinese taxonomy learning》

10.《Unsupervised learning of an IS-A taxonomy from a limited domain-specific corpus》

11.《Supervised distributional hypernym discovery via domain adaptation》

12.《Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs》

13.《What Is This, Anyway-Automatic Hypernym Discovery》

14.《Learning Word-Class Lattices for Definition and Hypernym Extraction》

15.《Taxonomy Construction Using Syntactic Contextual Evidence》

16.《Learning syntactic patterns for automatic hypernym discovery》

17.《A Semi-Supervised Method to Learn and Construct Taxonomies using the Web》

18.《Entity linking with a knowledge base: Issues, techniques, and solutions》

19.《Incorporating trustiness and collective synonym/contrastive evidence into taxonomy construction》

20.《Semantic class learning from the web with hyponym pattern linkage graphs》

21.https://www.shangyexinzhi.com/Article/details/id-23935/

22.https://zhuanlan.zhihu.com/p/30871301

23.https://xiaotiandi.github.io/publicBlog/2018-10-09-436b4d47.html