Automation of data preparation for mapping using natural language processing systems

DOI: 10.35595/2414-9179-2022-1-28-659-669

View or download the article (Rus)

About the Authors

Alexey A. Kolesnikov

Siberian State University of Geosystems and Technologies,
Plakhotnogo str., 10, 630108, Novosibirsk, Russia;
E-mail: alexeykw@mail.ru

Egor A. Plitchenko

Foundation for Support of Literary Creativity “Siberian Writer”,
Griboyedova str., 2-11, 630083, Novosibirsk, Russia;
E-mail: str2007@list.ru

Maria K. Kropacheva

Siberian State University of Geosystems and Technologies,
Plakhotnogo str., 10, 630108, Novosibirsk, Russia;
E-mail: kropacheva.m.k@gmail.com

Abstract

The current level of development of information technology makes it possible to automate the processing of those types of data that only a specialist could previously work with. One such example is natural language processing technologies that implement the functions of sentiment analysis, machine translation, and question-answer systems. For the processes of creating cartographic and geoinformation works, the methods of extracting named entities are of the greatest interest, which allows extracting geographical names from unstructured text and linking named entities, which make it possible to create logical links between the extracted names of spatial objects. Their processing, through a local or network database of the service for geocoding, will automate the creation of map layers in a geographic information system based on text messages. The article describes the most popular approaches and their software implementations for solving the problem of extracting named entities in the example of texts of biographies and works of Siberian writers. Rule-based methodologies, maximum entropy models, and convolutional neural networks are analyzed. To assess the quality of the results of extracting geographical names and objects from the text, in addition to the standard F1-score, the authors propose an additional variant of the evaluation method that takes into account a larger number of criteria and is also based on an error matrix. The description of text block markup formats is given to improve the quality of recognition and expand the possible options for geographical names of named entities based on additional training of the neural network model.

Keywords

geographical name, automation, named entity extraction, natural language processing, neural networks, Siberian writers

References

  1. Akbik A., Blythe D., Vollgraf R. Contextual string embeddings for sequence labeling. Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe: Association for Computational Linguistics, 2018. P. 1638–1649.
  2. Anh L.T., Arkhipov M.Y., Burtsev M.S. Application of a hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition. Artificial Intelligence and Natural Language. AINL, 2017. P. 91–103. DOI: 10.1007/978-3-319-71746-3_8.
  3. Beletskaya S.Y., Grinevich Y.S. Application of Hidden Markov Models and Conditional Random Fields for Named Entity Recognition. Intelligent information systems. Proceedings of the International Scientific and Practical Conference. Voronezh: VSTU, 2018. P. 121–125 (in Russian).
  4. Berant J., Chou A., Frostig R., Liang P. Semantic parsing on freebase from question-answer pairs. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP). Grand Hyatt Seattle, Seattle, Washington, USA: Association for Computational Linguistics, 2013. P. 1533–1544.
  5. Bodenhamer D.J., Corrigan J., Harris T.M. Deep maps and spatial narratives. Bloomington: Indiana University Press, 2015. 254 p. DOI: 10.2307/j.ctt1zxxzr2.
  6. Camelin N., Damnati G., Bouchekif A., Landeau A., Charlet D., Estève Y. FrNewsLink: a corpus linking TV Broadcast News Segments and Press Articles. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA), 2018. L18–1329.
  7. Cooper D., Donaldson C., Murrieta-Flores P. Literary Mapping in the digital age. Digital research in the arts and humanities. Abingdon: Routledge, 2016. 326 p. DOI: 10.4324/9781315592596.
  8. Cura R., Dumenieu B., Abadie N., Costes B., Perret J., Gribaudi M. Historical collaborative geocoding. ISPRS International Journal of Geo-Information, 2018. V. 7. No. 7. P. 262. DOI: 10.3390/ijgi7070262.
  9. De Oliveira M.G., De Souza Baptista C., Campelo C.E.C., Bertolotto M. A Gold-standard Social Media Corpus for Urban Issues. Proceedings of the Symposium on Applied Computing, 2017. P. 1011–1016. DOI: 10.1145/3019612.3019808.
  10. Ding J., Wang Y., Hu W., Shi L., Qu Y. Answering Multiple-Choice Questions in Geographical Gaokao with a Concept Graph. The semantic web—Proceedings of the 15th international conference, 2018. P. 161–176. DOI: 10.1007/978-3-319-93417-4_11.
  11. Ding N., Xu G., Chen Y., Wang X., Han X., Xie P., Zheng H., Liu Z. Few-NERD: A Few-shot Named Entity Recognition Dataset. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021. V. 1. P. 3198–3213. DOI: 10.18653/v1/2021.acl-long.248.
  12. Gong Y., Luo H., Zhang J. Natural Language Inference over Interaction Space. Proceedings of the 6th international conference on learning representations (ICLR), 2018.
  13. Honnibal M., Johnson M. An Improved Non-Monotonic Transition System for Dependency Parsing. Proceedings of the 2015 Conference an Empirical Methods in Natural Language Processing. Lisbon, Portugal: Association for Computational Linguistics, 2015. P. 1373–1378. DOI: 10.18653/v1/D15-1162.
  14. Isachenko V.V. The review of processing systems of natural language texts using the methods of the selection of named entities. Science and world, 2019. V. 7-1 (71). P. 33–35 (in Russian).
  15. Karpachevskiy A.M., Filippova O.G. Opportunities of power systems’ emergency mapping based on open data. InterCarto. InterGIS. Proceedings of the International Conference. Petrozavodsk: KRC RAS, 2018. V. 24. No. 1. P. 202–211. DOI: 10.24057/2414-9179-2018-1-24-202-211(in Russian).
  16. Konkol M., Konopík M. Segment Representations in Named Entity Recognition. Text, Speech, and Dialogue. TSD, 2015. P. 61–70. DOI: 10.1007/978-3-319-24033-6_7.
  17. Kukartsev V.V., Kolmakova Z.A., Melnikova O.L. System analysis of possibilities to retrieve essentials using text mining technology. Science Prospects, 2019. V. 9 (120). P. 18–20 (in Russian).
  18. Lally A., Bagchi S., Barborak M., Buchanan D.W., Chu-Carroll J., Ferrucci D.A., Glass M.R., Kalyanpur A., Mueller E.T., Murdock J.W., Patwardhan S., Prager J.M. WatsonPaths: Scenario-based question answering and inference over unstructured information. AI magazine. Menlo Park: Association for the advancement of artificial intelligence, 2017. V. 38. No. 2. P. 59–76. DOI: 10.1609/aimag.v38i2.2715.
  19. Mozharova V., Loukachevitch N. Two-stage approach in Russian named entity recognition. International FRUCT Conference on Intelligence, Social Media and Web. St. Petersburg: IEEE, 2016. DOI: 10.1109/FRUCT.2016.7584769.
  20. Ramalho R., Firmino A., Baptista C., Falcão A., De Oliveira M., De Andrade F. Using Natural Language Processing for Extracting GeoSpatial Urban Issues Complaints from TV News, 2020. P. 229–239.

For citation: Kolesnikov A.A., Plitchenko E.A., Kropacheva M.K. Automation of data preparation for mapping using natural language processing systems. InterCarto. InterGIS. GI support of sustainable development of territories: Proceedings of the International conference. Moscow: MSU, Faculty of Geography, 2022. V. 28. Part 1. P. 659–669. DOI: 10.35595/2414-9179-2022-1-28-659-669 (in Russian)