Aplicaciones de técnicas de extracción de informacion a bibliotecas digitales = Information Extraction to feed Digital Library
- 10 p.
Todos os direitos pertencem a Biblioteca Digital Miguel de Cervantes
Most often, Digital Libraries have the need to extract information from poorly marked-up documents to fill databases or create new hypertext documents with a highly structured markup. In this work, we approach the problem of extracting bibliographic information from literary reports in HTML format to fill a Digital Library database of Galician publications used for Internet searchs. An information extraction approach that takes advantage of both HTML markup and Natural Language Processing (NLP) techniques was successfully used for this purpose