The concept of pretrained language models in the context of knowledge engineering

Dmitry Ponkin

Abstract


The article studies the concept and technologies of pre-trained language models in the context of knowledge engineering. The author substantiates the relevance of the issue of the existence of internalized and implicit knowledge, extracted from text corpora used for pre-training or transfer learning in pre-trained language models. The article provides a detailed overview of the existing approaches to the interpretation of this concept. The author reviews a number of recent studies related to pre-training and transfer learning methods in regards to language models. This article discusses the latest research on the augmentation of language models with knowledge. Moreover, it studies the current research on the use of pre-trained language models to search and retrieve knowledge, to aid in the process of building knowledge bases, as well as their use as independent knowledge bases. The content of the concept "pretrained language models" is explained. The author provides examples of the implementation of pre-trained language models in practice, including the discussion of the use of language models as knowledge bases. The essence of the concept of unsupervised pre-training of language models using large and unstructured text corpora before further training for a specific task (fine tuning), "transfer learning", is also touched on. The author examines the concept of "knowledge graph", which is now widely used both in general and in the context relevant to this article, as well as a number of recent research in the realm of pre-training and transfer learning in regards to language models.


Full Text:

PDF (Russian)

References


Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // arXiv preprint arXiv:1810.04805. .

. Ruder S., Peters M.E., Swayamdipta S., Wolf T. Transfer learning in natural language processing // The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts. – Minneapolis (Minnesota, USA): Association for Computational Linguistics (ACL), 2019. – x; 27 p. – P. 15–18.

Raffel C., Shazeer N., Roberts A. etc. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer // arXiv preprint arXiv:1910.10683. .

Kassner N., Schütze H. Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly // arXiv preprint arXiv:1911.03343. .

Roberts A., Raffel C., Shazeer N. How Much Knowledge Can You Pack Into the Parameters of a Language Model? // arXiv preprint arXiv:2002.08910. .

Guu K., Lee K., Tung Z. etc. REALM: Retrieval-Augmented Language Model Pre-Training // arXiv preprint arXiv:2002.08909. .

Peters M.E., Neumann M., Iyyer M. etc. Deep contextualized word representations // arXiv preprint arXiv:1802.05365. .

Bouraoui Z., Camacho-Collados J., Schockaert S. Inducing Relational Knowledge from BERT // arXiv preprint arXiv:1911.12753. .

Goldberg Y. Assessing BERT’s Syntactic Abilities // arXiv preprint arXiv:1901.05287. .

Liu Y., Ott M., Goyal N. etc. RoBERTa: A Robustly Optimized BERT Pretraining Approach // arXiv preprint arXiv:1907.11692. .

Radford A., Narasimhan K., Salimans T. etc. Improving Language Understanding by Generative Pre-Training // .

Radford A., Wu J., Child R. etc. Language models are unsupervised multitask learners // .

Yang Z., Dai Z., Yang Y. etc. XLNet: Generalized Autoregressive Pretraining for Language Understanding // .

Richardson K., Sabharwal A. What Does My QA Model Know? Devising Controlled Probes using Expert Knowledge // arXiv preprint arXiv:1912.13337. .

Ehrlinger L.,Wöß W. Towards a Definition of Knowledge Graphs // Joint Proceedings of the Posters and Demos Track of 12th International Conference on Semantic Systems – SEMANTiCS2016 and 1st International Workshop on Semantic Change & Evolving Semantics (SuCCESS16). – Leipzig (Germany), 2016. (Vol. 1695).

Yoo S.-Y., Jeong O.-K. Automating the expansion of a knowledge graph // Expert Systems with Applications. – 2020, March. – Vol. 141.

Zhang N., Deng S., Sun Z. etc. Relation Adversarial Network for Low Resource Knowledge Graph Completion // arXiv preprint arXiv:1911.03091. .

Weng J., Gao Y., Qiu J. etc. Construction and Application of Teaching System Based on Crowdsourcing Knowledge Graph // Knowledge Graph and Semantic Computing: Knowledge Computing and Language Understanding: 4th China Conference, CCKS 2019, Hangzhou, China, August 24–27, 2019: Revised Selected Papers. – Singapore: Springer. 2019. – P. 25–37.

Yao L., Mao C., Luo Y. KG-BERT: BERT for Knowledge Graph Completion // arXiv preprint arXiv:1909.03193. .

Liu S., d’Aquin M., Motta E. Measuring Accuracy of Triples in Knowledge Graphs // Language, Data, and Knowledge: First International Conference, LDK 2017, Galway, Ireland, June 19-20, 2017: Proceedings. – Cham, Switzerland, 2017. – P. 343–357.

Ji S., Pan S., Cambria E. etc. A Survey on Knowledge Graphs: Representation, Acquisition and Applications // arXiv preprint arXiv:2002.00388. .

Logan R., Liu N.F., Peters M.E. etc. Barack’s Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling // Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. – Florence (Italy): Association for Computational Linguistics, 2019. – P. 5962–5971.

Wang R., Tang D., Duan N. etc. K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters // arXiv preprint arXiv:2002.01808. .

Yang B., Mitchell T. Leveraging Knowledge Bases in LSTMs for Improving Machine Reading // arXiv preprint arXiv:1902.09091. .

Ostendorff M., Bourgonje P., Berger M. etc. Enriching BERT with Knowledge Graph Embeddings for Document Classification // arXiv preprint arXiv:1909.08402. .

Hu Y., Lin G., Miao Y. etc. Commonsense Knowledge + BERT for Level 2 Reading Comprehension Ability Test // arXiv preprint arXiv:1909.03415. .

He B., Zhou D., Xiao J. etc. Integrating Graph Contextualized Knowledge into Pre-trained Language Models // arXiv preprint arXiv:1912.00147. .

Xiong W., Du J., Wang W.Y. etc. Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model // arXiv preprint arXiv:1912.09637. .

Wang X., Gao T., Zhu Z. KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation // arXiv preprint arXiv:1911.06136. .

Bosselut A., Rashkin H., Sap M. etc. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction // arXiv preprint arXiv:1906.05317. .

Davison J., Feldman J., Rush A.M. Commonsense knowledge mining from pretrained models // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). – Hong Kong (China): Association for Computational Linguistics, 2019. – P. 1173–1178.

Wang C., Qiu M., Huang J. etc. KEML: A Knowledge-Enriched Meta-Learning Framework for Lexical Relation Classification // arXiv preprint arXiv:2002.10903. .

Zhang J., Zhang Z., Zhang H. etc. Enriching Medical Terminology Knowledge Bases via Pre-trained Language Model and Graph Convolutional Network // arXiv preprint arXiv:1909.00615. .

Trisedya B.D., Weikum G., Qi J. Neural Relation Extraction for Knowledge Base Enrichment // Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. – Florence (Italy): Association for Computational Linguistics, 2019. – P. 229–240.

Petroni F., Rocktaschel T., Lewis P. Language Models as Knowledge Bases? // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). – Hong Kong (China): Association for Computational Linguistics, 2019. – P. 2463–2473.

Chen D., Fisch A., Weston J. etc. Reading Wikipedia to Answer Open-Domain Questions // arXiv preprint arXiv:1704.00051. .

Liu A., Huang Z., Lu H. BB-KBQA: BERT-Based Knowledge Base Question Answering // Chinese Computational Linguistics: 18th China National Conference, CCL 2019, Kunming, China, October 18–20, 2019: Proceedings. – Cham (Switzerland): Springer, 2019. – P. 81–92.

Poerner N., Waltinger U., Schütze H. E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT // arXiv preprint arXiv:1911.03681. .

Talmor A., Elazar Y., Goldberg Y. etc. oLMpics – On what Language Model Pre-training Captures / A. Talmor // arXiv preprint arXiv:1912.13283. .

Sun C., Qiu X., Xu Y. etc. How to Fine-Tune BERT for Text Classification? // Chinese Computational Linguistics: 18th China National Conference, CCL 2019, Kunming, China, October 18–20, 2019: Proceedings. – Cham (Switzerland): Springer, 2019. – P. 194—206.

Lu W., Jiao J., Zhang R. TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval // arXiv preprint arXiv:2002.06275. .


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность MoNeTec 2024

ISSN: 2307-8162