An incredible quantity of archaeological reports is stored in digital archives. If you want to search for information in them, you typically have to do it manually. And that is a real chore. Now an archaeology search engine can help.
Archaeologist Alex Brandsen, a postdoctoral researcher in digital archaeology at Leiden University, has now used deep learning, a form of artificial intelligence, to develop a search engine named AGNES that can search very precisely through data. His Ph.D. defense is on February 15.
Archaeologists have been producing an enormous amount of reports since the Treaty of Valetta (1992). This treaty regulates how European archaeological heritage is handled. It means that if you are going to start construction work anywhere, you first need to check whether there is archaeological heritage in the ground. This has meant that in the Netherlands alone, for example, thousands of reports are written per year.
Deep Learning in Archaeology Search Engine
Brandsen used deep learning to develop a smart search engine, a kind of Google for archaeologists. He trained a language model to recognize words in archaeological reports. It was important that the model could also recognize synonyms and distinguish between different meanings of a word.
“The word bijl [Dutch for ax] can refer to an artifact that you can chop things with, but it can also be a surname. If you are now looking for the artifact bijl, that is all you will find and no Mr. Bijls anymore,” said Brandsen.
It also is possible to search geographically. This will retrieve information about an area specified by the user.
Brandsen and a colleague tested the search engine, AGNES.
“My colleague had been given a database of cremations in the early Middle Ages in the Netherlands by the expert on the period. This professor has spent his whole life collecting this data. But with the search engine, we found 30 percent more cremations from the early Middle Ages. So you see that even an expert doesn’t know everything because there is so much data,” said Brandsen.
A rough version of AGNES can now carry out searches at an accuracy of about 80 percent. As a postdoc, Brandsen is going to make the search engine more accurate and expand it by also enabling searches in other languages.
AGNES currently only indexes documents with manually assigned metadata, but in the near
future, documents without metadata will be added. To allow for faceted search on these documents as well, the researchers propose to automatically assign metadata. Manual labeling is an unfeasible task due to the number of texts. There currently are an estimated 70,000 documents and 4,000 to 5,000 are added each year. Because of this volume, using text mining and machine learning techniques becomes a necessity