Semantic Analysis

From SAM
Jump to: navigation, search

Semantic Analysis is a branch of Natural Language Processing (NLP) dealing with machine reading comprehension. It has been applied to different NLP tasks, such as news gathering, text categorization, voice activation, content analysis, and data characterization. To address semantic analysis it is necessary to determine the appropriate syntactic and semantic schemes to be applied to the target human language, relating syntactic structures at the levels of sentences, paragraphs or simple pieces of text to their language-independent meanings. To this purpose, Semantic Analysis often takes into account idiomatic expressions, figurative speech and cultural references.


In order to create effective NLP systems it is necessary to transform the information extracted from the words in plain text into a conceptual level detecting meaningful word senses. One of the principal and frequently studied problems in NLP is that of measuring semantic similarity and relatedness. In this case, the problem is to estimate how similar or related two words are in order to establish different semantic relations. Two main approaches are used to tackle this problem: knowledge based approaches[1][2] and corpus-based approaches[3]. The former requires lexical resources such as WordNet and Roget's Thesaurus to obtain semantic similarities, while the latter uses co-occurrences to measure the similarity between words.

Measuring semantic similarities is related to the problem of Word Sense Disambiguation (WSD) which is a pervasive issue in NLP tasks. In WSD the goal is to determine the senses of the words in a text in which a word may have different senses depending on the context that it appears in. For example, lets consider the word running in different contexts:

  • The software is running
  • The athlete is running
  • The coach put great emphasis on running

We can appreciate that running has different meanings according to the different contexts in which it may appear[4]. In the first example, the word running has the meaning of operativity, while in the last two examples the meaning is related to the same general concept (Sport).

Relevance to SAM

Novel semantic analysis techniques will be applied to deal with the vast amount of information available in the SAM Platform. These techniques will be applied to all the textual Content, providing meaning and allowing to interpretate user feedbacks and preferences.

Different tasks in SAM are related to Sentiment Analysis. For instance, T4.3 Data Characterisation Services, where tools and Interfaces to access ontologies used to annotate and classify Assets are defined. In this task Semantic Analysis will be used to provide acceptable characterisation of data and Assets suggestion. The results of T4.3 will benefit tasks such as T5.2 Content Gateways and T5.3 Assets Aggregation and Composition, by using data characterisation and semantic exploration functionalities with the goal of aggregating and composing Assets according to their Content.

Also different tasks in WP6 Context Analysis & Dynamic Creation of Social Communities will benefit Semantic Analysis in order to provide a context-centric approach. With the combination of Semantic Analysis, Sentiment Analysis and Social Network Analysis the SAM Platform will implement Context analysis mechanisms for detection of Context changes and creation of Dynamic Communities. In T6.4 Business Intelligence and Social Mining, Semantic Analysis will provide meaning to the Context to be able to discover tendencies, User Preferences, etc.

State of the Art Analysis

The most representative task of this area is WSD. In the development of this kind of systems the following elements are involved:

  • Semantic analysis process to select the correct word senses or meaning of the words.
  • Resources to be taken into account:
  • Context of the word
  • External sources (e.g., lexical resources and ontologies)


There are different semantic systems according to the level of supervision applied: Supervised, Unsupervised, and Semi-supervised or Hybrid

In the other hand these systems can be classified as Knowledged Based, Corpus-Based, and Hybrid (a combination of the previous approaches).

Systems based on statistics are an important type of WSD systems which consider the frequency of semantic elements (e.g., Most Frequent Sense - MFS) to determine the correct meaning of the target word. According to the last results obtained by graph based approaches in WSD, the results are quite similar to those achieved by the best supervised approaches. In Semeval web site can be found the results of the best systems according to the categories aforementioned.

One of the most representative articles for studying Semantic Analysis is "Word sense disambiguation: A survey" by Roberto Navigli. In it can be found an extensive collection of approaches that conform the State of the Art and are suitable for be applied in SAM.


Like other NLP tasks, WSD systems use different kinds of resources, corpus and techniques to obtain the correct senses of words, thus making it difficult to evaluate which system is better since they do not use the same corpus or repositories to obtain the correct word senses. The need for evaluating different tasks in NLP led to the creation of the Senseval competition. The main goal of this competition was to initially measure the strengths and weaknesses of WSD systems with regard to differen words, different aspects of language, and different languages. The first Senseval competition was held in 1998 at Herstmonceux Castle, Sussex (England), and new competitions has taken place every three years since then.

Each NLP task was therefore defined in order to evaluate systems using the same repositories and corpus. At Senseval-1 (in 1998), three languages were the focus of this competition: English, Italian, and French. Senseval-2 (in 2001) included different tasks for English, Italian, Spanish, Swedish, Chinese, Basque, Dutch, Czech, Danish, Estonian, Japanese and Korean. Senseval-3 (in 2004) included 14 different tasks for WSD, as well as the identification of semantic roles, logic forms, subcategorisation acquisition and multilingual annotations. SemEval-2007 included 18 different tasks focusing on the evaluation of systems to deal with semantic analysis of text. SemEval-2010 included 18 different tasks targeting the evaluation of semantic analysis systems. SemEval-2012 included 8 different tasks addressing the evaluation of computational semantic systems. However, an interesting fact identified this campaign, there was no WSD tasks involved in SemEval-2012. It only held tasks to deal with lexical-semantics measures. All WSD related tasks were moved to SemEval-2013. Recently Semeval-2014 has taken place including 10 tasks in different languages. These tasks evaluate systems which involves semantic features, however, the traditional WSD has not included.


WSD has proven to be necessary to improve the results of other tasks such as Machine Translation, Information Extraction, Question Answering, Information Retrieval, Text Classification and Text Summarisation. WSD is therefore considered to be an essential task for all these applications[5], and thus many research groups in the field of Computational Linguistics are working on this ara using a wide range of approaches.

Related Projects

There have been a number of projects dealing with the specific problem of Semantic Analysis. In the following links there is a comprehensive description of some of them:

Tools, Frameworks and Services

In order to obtain additional information to solve different NLP problems, a variety of semantic resources have been used. However, one of the main problems of using semantic resources is their decentralization. Despite the fact that WordNet serves as kernel to develop different resources and applications, there are few tools that integrate them together. We can mention some works focused on the idea of building semantic networks with the same interface. For further reference: Natural Language Processing/Tools, Frameworks and Services.

One of the most interesting resources for dealing with Semantic Analysis is DBpedia. This is considered as a knowledge base that allows to ask sophisticated queries against Wikipedia, and to link the different data sets on the Web to Wikipedia data. Furthermore, it might inspire new mechanisms for navigating, linking, and improving the encyclopedia itself. Based on this resource the Semantic Services component will be able to explore semantic knowledge for addressing Data Characterisation tasks.

SAM Approach

Dealing with natural language in SAM involves the inclusion of linguistic and semantic analysis in the architecture. The Semantic Service component is being developed as a central component to deal with Semantic and Natural Language Processing (NLP) facilities to different components of the SAM Platform. Next paragraph describes the different subcomponents of the Semantic Services, the logical connections established between them and the relationships with other components and actors in SAM.

Architecture and Dependencies

This the component in charge of providing natural language processing functionalities to the platform, offering other components in SAM technologies to perform Data Characterisation, ontology exploitation along with Sentiment Analysis and Text Summarisation. The Semantic Services component includes five accessible subcomponents and one user interface, and the logical connections established between them are represented in the figure beneath. More specifically:

  • The Semantic Services component will directly interact with different components of the SAM Platform providing semantic functionalities based on linguistic and semantic knowledge.
  • These functionalities consist of semantic data analysis, semantic exploration and discovery, Sentiment Analysis and Text Summarisation.
  • As part of this component the Asset Profiler editor will provide a user interface to allow SAM Content Providers to supervise and edit Assets imported into the SAM Platform, as well as to create new ones.


Implementation and Technologies

After Extended Analysis and comparison the most appropriate technologies for the frontend and the backend have been selected.

Frontend Technologies (User Interface)

The user interface Asset Profiler has been implemented using the technology AngularJS. Also because the aforementioned interface is embedded inside the marketplace, it is reasonable to choose this technology to provide a common look and feel[6].

Backend Technologies (Web Services)

The Semantic Services prototype uses the JAX-RS technologies and more specifically the Jersey framework[7] in order to implement the RESTful Web Services for this component. In addition, the Semantic Services API uses the Swagger framework[8] to obtain an interactive documentation. This should considerably ease the implementation, deployment and testing of the Semantic Services-backend environment.


In the first prototype of this component the tasks addressed are described in the table below. The tasks labelled as [mock-up] identify those functionalities that have not been developed in this prototype, although mock-up interfaces have been created and documented, including the definition of input and output data schemas for interoperability.

Subcomponent Task
Data Characterisation The following functionalities have been implemented:
  • When external assets are imported into the SAM platform by the Content Gateways component (see D5.2.1 for more information), the Data Characterisation subcomponent performs a mapping between the structure of the imported data and the Asset structure (ontology) defined in SAM.
  • Given an input text (usually obtained from the Asset’s content), this subcomponent identifies occurring mentions to Wikipedia entities (a process known as entity linking ).
  • This subcomponent also identifies mentions to existing Assets in SAM, both in Asset’s content and user generated comments [mock-up]
Asset Discovery The following functionalities have been implemented supported by the methods developed by internal Semantic Services subcomponents:
  • Receives a query (in the form of a set of keywords) and provides a list of Assets related to that query [mock-up]
  • Asset CRUD (create, read, update and delete) operations
Sentiment Analysis Performs sentiment analysing on User Generated Content (UGC)
Text Summarisation Summarises large amounts of UGC, keeping only the most representative comments provided by the users regarding specific Assets [mock-up]
User Interface Task
Asset Profiler User interface that provides functionalities to edit, create and delete Assets [mock-up]

Note the Semantic Resources Explorer is a subcomponent that supports internal functionalities of the Semantic Services regarding the accessibility to the semantic data of SAM, i.e. Assets stored into the Cloud Storage and external data extracted from DBpedia. Therefore the current report of progress will be associated to those subcomponents that use its functionalities. The Semantic Resources Explorer involves the following functionalities (the last one being a mock-up):

  • Asset CRUDs operations: it allows storing and querying Assets useful to other Semantic Services subcomponents
  • Exploration of DBpedia entities: it allows querying DBpedia
  • Exploration of Sentiment Analysis resources: it allows querying Sentiment Analysis lexicons for sentiment evidences and lexical units [mock-up]

Functionality and UI Elements

This section introduces the Asset Profiler interface and presents Asset Discovery (which includes Semantic Resources Explorer) together with Data Characterisation subcomponents.

Asset Profiler

The Semantic Services component includes a web user interface, the Asset Profiler, to facilitate Asset creation and edition. The final prototype will include three tabs for profiling Assets:

  • main tab: It allows setting main characteristics of the Assets, such as title, owner and language


  • semantic tab: It allows automatic characterisation of Assets in order to connect them with other Assets and external sources, such as Wikipedia entries


In this first prototype the main and semantic tabs have been developed as mock-ups. As part of this subcomponent, the Asset Profiler backend is being developed in parallel by providing data views to the user interface. As a result of the edition process, an Asset can be created, updated, or deleted from the SAM platform.

Asset Discovery

This section describes the two main operations of this subcomponent. The first one is to provide Asset suggestions based on the functionalities provided by internal subcomponents. This functionality is not available in this first prototype. Although the functionality is not available, a mock-up documentation demo page has been developed to provide input and output information of the service. AssetDiscoverAPI.png

This interface (see figure above) requires as input a JSON object containing a set of keywords and context information related to the Asset requested, together with information on the user profile to customise the search. As result, a list of Assets will be retrieved which are coded as JSON-DL.

The second functionality provided by this subcomponent is the Asset exploration and management based on the CRUD operations exposed by internal Semantic Services subcomponents (i.e. Semantic Resource Explorer). The version available of this subcomponent for the first prototype already includes all these CRUD operations. Each operation involves a RESTful interface and a documentation demo page, where information on how to proceed with the CRUDs and Asset attributes exploration functions is given. In the Parameters section of each operation, the body textbox can be filled with a JSON example, checking the result using the Try it out! button. The section Response Messages provides information on the possible responses of the service when the execution is carried out. The RESTful CRUDs interfaces provide the following functionalities: load, delete, create and update:

  • The load method selects Assets instances by using filters and indicating what Assets’ attributes have to be included into the queried Assets (see figure below).


This interface provides an example of the JSON input and output formats. The result of this method is a collection of Assets which attributes match the value specified in the outputAttributes field, disregarding attributes not specified in the input.

    "name": "Casino Royale",
    "URL": “”
  • The operation delete removes an Asset instance by indicating its unique identifier (see figure beneath).


  • The operation create stores a specific Asset into the Cloud Storage (see figure below). This interface provides an example of the JSON input (an Asset in JSON-LD format) and output (a unique identifier assigned to the new Asset stored which has string format).


  • Finally, the operation update allows the changing of content of an existing Asset by submitting a modified Asset instance (see figure beneath).


Data Characterisation

The Data Characterisation subcomponent provides ontology mapping and semantic characterisation of content. In order to address these functionalities, two interfaces are described. The first one provides a mapping procedure between incoming data structures and the SAM ontology, whereas the second one is dedicated to characterise textual content by identifying Wikipedia entities in text.

Ontology Mapping

This first prototype provides a RESTful interface for suggesting alignments between the SAM ontology and the structure of the contents that will be imported into the platform by the Content Gateways. The documentation demo page for this service is shown on the figure below. In the Parameters section of this demo page, the body textbox can be filled with the name of the labels that the user wants to map to the SAM ontology (such as price or title in the example given in the figure). The input format of the information to be included in the body textbox has been simplified for the sake of demonstration. The Try it out! button runs the service. The section Response Messages provides information of possible responses of the mapping service when the execution is performed. The response of the service is a JSON providing a list of alignment suggestions between the label introduced and the concepts of the SAM ontology. OntologyMappingAPI.png
The mapping of incoming structures (represented by the data inserted into the body textbox) with the SAM ontology is based on semantic structural algorithms [9]. These algorithms are based on the Levenshtein distance[10] in order to establish a measure for getting the most similar terms to align between two ontologies, and consequently searching the most similar structures by exploring adjacent semantic relations. In this way, the most coincident structures are calculated. As a result, a confidence score of the most similar matching terms of both ontologies can be obtained.

Content Characterisation

In the first prototype, the main contribution to this task is the development of an entity linking method that identifies mentions to Wikipedia entities in an input text.
Figure above shows a screenshot of the web interface. For an input text, the service retrieves a list of suggestions for every Wikipedia entity detected in its content. The list of suggestions is ordered by a confidence score between 0 and 1, indicating the level of trust in the prediction given. In the current approach, this confidence score is based on two key features. The first one is the number of incoming links to the Wikipedia article - more links implies more relevance; the second one is the similarity between the context (list of words adjacent to the target entity) of the entity found in text and the description of that entity in Wikipedia. For this purpose, the Lesk disambiguation algorithm has been employed [11].

Another task to be carried out by the Content Characterisation subcomponent is the identification of Asset mentions in text (both Asset’s content and UGCs). In the current prototype, the only available version of these functionalities is a mock-up. The input for this service is a JSON object which includes an Asset object and a list of Asset’s attributes named path. This value indicates what attribute’s content should be analysed by the characterisation process. As a result of this operation, a list of Asset and Wikipedia entities will be obtained.
Similarly, there is a demo page for the identification of Asset mentions in UGC. In this case, the input for this service is a JSON object including both the ugc to analyse and the context (a list of keywords that identify those Assets that are being used at the same context ) where the ugc has been generated. As a result of this operation, a list of Assets and Wikipedia entities will be obtained. As already mentioned, in the Parameters section the body textbox can be filled with a JSON example, checking the result using the Try it out! button. The section Response Messages provide information on the possible responses when the process is carried out.

Latest Developments

During the last period of the SAM life cicle some new developments have been performed. These new developments have been mostly addressed toward the improvement of existing RESTful interfaces based on a enriched SAM Asset structure and evaluating the research technologies. These progresses obviously affected the User Interface (Asset Profiler) which deals with the edition and semantic enrichment of the SAM Assets. The section below describes the new appearance of the Asset Profiler interface.

Asset Profiler V3

As has been previously mentioned the Semantic Services component includes a web user interface, the Asset Profiler, to facilitate Asset creation and edition. The inter-media prototype proposed three tabs for profiling Assets: main tab, semantic tab and social tab (see Asset_Profiler). However, this User Interface has been reevaluated based on the great potential offered by the Asset_Description resulting in the development of a enriched and extended User Interface. The lastest version of the Asset Profiler provides five full operative tabs (three of them are already described below: main tab, semantic tab and social tab):

  • social tab: In this final prototype the social tab provides a full developed interface as canbe seen in the following print-screen.

AP SocialTab.png

  • product tab: includes specific information about the type of Asset (i.e. Movie, Game, Music Album, Music Recording, Book).

  • generic tab: allows the inclusion of the Asset fields imported from the original source

AP Generic.png

Research progresses

For testing the research technologies provided in SAM it have been carried out different evaluation on the technologies involved.

Data characterisation evaluation

Data characterisation evaluations were split into two parts Entity Linking based on Wikipedia and based on SAM Assets. The goal of the Data Characterisation system in the proposed experiments is to identify mentions to Person, Fictional Character, Written work or Book, Videogame or Software, Organization, Album, Single or Musical Work in plain text (i.e. removing any link or clue about entity mentions) and link them to their corresponding Wikipedia page. For both, Entity Linking based on Wikipedia and based on SAM Assets, a corpus was developed which is described next.

Corpus development

To build the evaluation corpus, it was retrieved a list of IMDB top 500 films voted by users. A crawler processed this list to retrieve the description for each film from the English Wikipedia (the set of paragraphs occurring before the table of contents). These text fragments were parsed to find hyperlinks (easily identified by “<a>” tags in the source code) to other Wikipedia pages. The landing pages of these hyperlinks were analysed, and only those belonging to a certain DBpedia ontology class (rdf:type) were retained, namely: Person or Fictional Character , Written work or Book , Videogame or Software , Organization , Album , Single or Musical Work . In this way it was obtained the list of entities of these types occurring in each film synopsis. For instance, “The Godfather” synopsis contained Person mentions to “Francis Ford Coppola”, “Albert S. Ruddy”, “Mario Puzo”, and “Al Pacino” among others.

The resulting corpus contains about 4506 documents, being Person the most common type and Videogame or Software the less frequent one. Each document contains 27.21 words and 2.22 entities on average. Also is considered the number of entities of a given type to the other ones: e.g. among the 261 book synopsis, there are mentions to 77 books, 125 films, 6 games, 28 organizations, 315 persons, 3 songs and 28 albums.

Entity Linking (Wikipedia)

This process consists of two basic stages, Entity Detection (for detecting Entities in texts) and Linking (for Linking the entities detected to specific entries of a repository, Wikipedia in this case). For linking entities the approach is based on the disambiguation process described in the research paper [12]. Before running the experiments, it has been tested how ambiguous are the candidates within our corpus. For each noun phrase, we found an average of 1.914 candidates. It should be noted that the number of not ambiguous candidates (8038) is greater than the ambiguous ones (1174). It means that it is necessary to apply a disambiguation process to link entries properly. The results obtained can be seen in the next result table.

Entity Linking (SAM Assets)

This process consists of two basic stages, Entity Detection (for detecting Entities in texts) and Linking (for Linking the entities detected to specific entries of a repository, SAM database in this case). For linking entities the approach is based on the disambiguation process described in the previous mentioned research paper . Before running our experiments, we have tested how ambiguous are the candidates within our corpus. For each noun phrase in the synopsis, a query to Lucene was performed based on certain asset fields (i.e. keywords, title) as well as a threshold. For both keywords and title are taken into account, the average of candidates is drastically reduced if the threshold is higher. When title and keywords are considered separately, the average of candidates is reduced more slightly. In our experiments we used a threshold set in 0.4. The results obtained can be seen in the next result table.

Ontology Mapping

This task aims to align the structure of imported data to the SAM Ontology. For that the Data Characterisation component has to provide a list of suggestions for this mapping. Having these suggestions the Content Providers will approve or manually decide the correct mapping. The approach taken into account consists of reusing the Data characterisation technologies previously described but now it is only applied a lexical similarity and synonyms supporting between the origin (SAM classes and attributes) and the target (concepts to align from the importing data). In order to commit the evaluation it has been selected 141 concepts (i.e. attributes from Book 34, Movie 40, Music 26, Game 20, MusicRecording 9, Person 12) collected from the BDS data which were previously manually aligned. The results obtained can be seen in the next result table.

Discover Assets

This technology form part of a recommender system developed for SAM in which user profile information and user data consumption is considered to suggest Assets. This technology is based on a combination of the data characterisation of Asset functionalities before mentioned and Ontology queries. The goal of this service is to obtain Assets lexical and semantically related to user profile information (keywords and Assets Ids of Assets consumed, language, country, etc.) provided in the input of this service. For evaluating this approach it was collected the logs of first SAM trials form which 376 Assets were collected as suggested by this functionality. The evaluation consisted on compute the number of Assets suggested to the users was consumed by them during the experience. The results obtained can be seen in the next result table.

Research results

A list of scientific publications are mentioned next, which constitutes the research result of the SAM technologies regarding Semantic Analysis: [13], [14], [15], [16], [17], [18]

Technology Corpus Precision Recall
Entity Linking (SAM Assets) 4506 documents 90% 89%
Entity Linking (Wikipedia) 4506 documents 87% 89%
Mapping of Structures 141 concepts from DBS[1] data 76% 76%
Discover Assets 376 Assets suggested from SAM trials 21.55% Assets viewed 44.64% Viewed considering only active users


  1. ZHIBIAO, W. & MARTHA, P. (1994) Verbs semantics and lexical selection. Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Las Cruces, New Mexico, Association for Computational Linguistics.
  2. LEACOCK, C. & CHODOROW, M. (1998) Using Corpus Statistics and WordNet Relations for Sense Identification. Computational Linguistics.
  3. PETER, D. T. (2001) Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the 12th European Conference on Machine Learning. Springer-Verlag.
  4. Fellbaum, C. (1998). WordNet. An Electronic Lexical Database. University of Cambridge.
  5. Ide, N. and J. Véronis (1998). "Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art." Computational Linguistics. MIT Press 24(1): 2--40.
  9. Shvaiko, P., & Euzenat, J. Ontology matching: state of the art and future challenges. Knowledge and Data Engineering, IEEE Transactions on, vol 25 (1), pp 158-176, 2013.
  10. Levenshtein, V. I. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. In Soviet Physics Doklady vol. 10, pp. 707, 1966.
  11. Lesk, M. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In SIGDOC '86: Proceedings of the 5th annual international conference on Systems documentation, pp 24-26, New York, NY, USA. ACM, 1986.
  12. Tomás,D. Gutiérrez, Y. Agulló, F. Entity Linking in Media Content and User Comments: Connecting Data to Wikipedia and other Knowledge Bases. Proceedings of eChallenges 2015 e-2015. pp 1-10, 2015.
  13. Gutiérrez, Y; Vázquez, S & Montoyo, A. A semantic framework for textual data enrichment. Expert Systems with Applications. 2016
  14. Lloret, E.; Gutiérrez, Y. & Gómez, J. M. Developing an Ontology to Capture Documents' Semantics. KEOD 2015 - Proceedings of the International Conference on Knowledge Engineering and Ontology Development, part of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management “IC3K” 2015. Vol 2, pp 155-162, 2015.
  15. Tomás, D; Gutiérrez, Y. Human Language Technologies in Media Consumption: The Case of SAM. Impact of Social Media on TV Content Consumption - New Market Strategies, Scenarios and Trends. EEE Computer Society Special Technical Community on Social Networking E-Letter. Vol 3(2), pp 1-7, 2015.
  16. Tomás D., Gutiérrez, Y; Moreno, I.; Agulló, F.; Tiemann, M.; Vidagany, J.V.; Menychtas, A. Socialising Around Media (SAM): Dynamic Social and Media Content Syndication for Second Screen. Procesamiento del Lenguaje Natural. Vol 55, pp 181-184, 2015.
  17. Psychas,A.; Menychtas,A.; Santzaridou, C.; Varvarigou, T.; Gutiérrez, Y.; Moreno, I.; Tomás, D. Media Content Linking, Semantic Annotation and Syndication in Social Enabled, Multiscreen Environments. Proceedings of eChallenges 2015 e-2015. pp 1-10, 2015
  18. Tomás,D. Gutiérrez, Y. Agulló, F. Entity Linking in Media Content and User Comments: Connecting Data to Wikipedia and other Knowledge Bases. Proceedings of eChallenges 2015 e-2015. pp 1-10, 2015.