-
Efthimiadis, E.N.: User choices : a new yardstick for the evaluation of ranking algorithms for interactive query expansion (1995)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 6697) [ClassicSimilarity], result of:
0.06594159 = score(doc=6697,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 6697, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=6697)
0.25 = coord(1/4)
- Abstract
- The performance of 8 ranking algorithms was evaluated with respect to their effectiveness in ranking terms for query expansion. The evaluation was conducted within an investigation of interactive query expansion and relevance feedback in a real operational environment. Focuses on the identification of algorithms that most effectively take cognizance of user preferences. user choices (i.e. the terms selected by the searchers for the query expansion search) provided the yardstick for the evaluation of the 8 ranking algorithms. This methodology introduces a user oriented approach in evaluating ranking algorithms for query expansion in contrast to the standard, system oriented approaches. Similarities in the performance of the 8 algorithms and the ways these algorithms rank terms were the main focus of this evaluation. The findings demonstrate that the r-lohi, wpq, enim, and porter algorithms have similar performance in bringing good terms to the top of a ranked list of terms for query expansion. However, further evaluation of the algorithms in different (e.g. full text) environments is needed before these results can be generalized beyond the context of the present study
-
Tudhope, D.; Blocks, D.; Cunliffe, D.; Binding, C.: Query expansion via conceptual distance in thesaurus indexed collections (2006)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 3215) [ClassicSimilarity], result of:
0.06594159 = score(doc=3215,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 3215, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=3215)
0.25 = coord(1/4)
- Abstract
- Purpose - The purpose of this paper is to explore query expansion via conceptual distance in thesaurus indexed collections Design/methodology/approach - An extract of the National Museum of Science and Industry's collections database, indexed with the Getty Art and Architecture Thesaurus (AAT), was the dataset for the research. The system architecture and algorithms for semantic closeness and the matching function are outlined. Standalone and web interfaces are described and formative qualitative user studies are discussed. One user session is discussed in detail, together with a scenario based on a related public inquiry. Findings are set in context of the literature on thesaurus-based query expansion. This paper discusses the potential of query expansion techniques using the semantic relationships in a faceted thesaurus. Findings - Thesaurus-assisted retrieval systems have potential for multi-concept descriptors, permitting very precise queries and indexing. However, indexer and searcher may differ in terminology judgments and there may not be any exactly matching results. The integration of semantic closeness in the matching function permits ranked results for multi-concept queries in thesaurus-indexed applications. An in-memory representation of the thesaurus semantic network allows a combination of automatic and interactive control of expansion and control of expansion on individual query terms. Originality/value - The application of semantic expansion to browsing may be useful in interface options where thesaurus structure is hidden.
-
Ng, K.B.: Toward a theoretical framework for understanding the relationship between situated action and planned action models of behavior in information retrieval contexts : contributions from phenomenology (2002)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 3588) [ClassicSimilarity], result of:
0.06594159 = score(doc=3588,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 3588, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=3588)
0.25 = coord(1/4)
- Abstract
- In human-computer interaction (HCI), a successful interaction sequence can take its own momentum and drift away from what the user has originally planned. However, this does not mean that planned actions play no important role in the overall performance. In this paper, the author tries to construct a line of argument to demonstrate that it is impossible to consider an action without an a priori plan, even according to the phenomenological position taken for granted by the situated action theory. Based on the phenomenological analysis of problematic situations and typification the author argues that, just like "situated-ness", "planned-ness" of an action should also be understood in the context of the situation. Successful plan can be developed and executed for familiar context. The first part of the paper treats information seeking behavior as a special type of social action and applies Alfred Schutz's phenomenology of sociology to understand the importance and necessity of plan. The second part reports results of a quasi-experiment focusing on plan deviation within an information seeking context. It was found that when the searcher's situation changed from problematic to non-problematic, the degree of plan deviation decreased significantly. These results support the argument proposed in the first part of the paper.
-
Lehtokangas, R.; Järvelin, K.: Consistency of textual expression in newspaper articles : an argument for semantically based query expansion (2001)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 5485) [ClassicSimilarity], result of:
0.06594159 = score(doc=5485,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 5485, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=5485)
0.25 = coord(1/4)
- Abstract
- This article investigates how consistent different newspapers are in their choice of words when writing about the same news events. News articles on the same news events were taken from three Finnish newspapers and compared in regard to their central concepts and words representing the concepts in the news texts. Consistency figures were calculated for each set of three articles (the total number of sets was sixty). Inconsistency in words and concepts was found between news articles from different newspapers. The mean value of consistency calculated on the basis of words was 65 per cent; this however depended on the article length. For short news wires consistency was 83 per cent while for long articles it was only 47 per cent. At the concept level, consistency was considerably higher, ranging from 92 per cent to 97 per cent between short and long articles. The articles also represented three categories of topic (event, process and opinion). Statistically significant differences in consistency were found in regard to length but not in regard to the categories of topic. We argue that the expression inconsistency is a clear sign of a retrieval problem and that query expansion based on semantic relationships can significantly improve retrieval performance on free-text sources.
-
Kelly, D.: Measuring online information seeking context : Part 1: background and method (2006)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 331) [ClassicSimilarity], result of:
0.06594159 = score(doc=331,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 331, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=331)
0.25 = coord(1/4)
- Abstract
- Context is one of the most important concepts in information seeking and retrieval research. However, the challenges of studying context are great; thus, it is more common for researchers to use context as a post hoc explanatory factor, rather than as a concept that drives inquiry. The purposes of this study were to develop a method for collecting data about information seeking context in natural online environments, and identify which aspects of context should be considered when studying online information seeking. The study is reported in two parts. In this, the first part, the background and method are presented. Results and implications of this research are presented in Part 2 (Kelly, in press). Part 1 discusses previous literature on information seeking context and behavior and situates the current work within this literature. This part further describes the naturalistic, longitudinal research design that was used to examine and measure the online information seeking contexts of users during a 14-week period. In this design, information seeking context was characterized by a user's self-identified tasks and topics, and several attributes of these, such as the length of time the user expected to work on a task and the user's familiarity with a topic. At weekly intervals, users evaluated the usefulness of the documents that they viewed, and classified these documents according to their tasks and topics. At the end of the study, users provided feedback about the study method.
-
Kelly, D.: Measuring online information seeking context : Part 2: Findings and discussion (2006)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 340) [ClassicSimilarity], result of:
0.06594159 = score(doc=340,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 340, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=340)
0.25 = coord(1/4)
- Abstract
- Context is one of the most important concepts in information seeking and retrieval research. However, the challenges of studying context are great; thus, it is more common for researchers to use context as a post hoc explanatory factor, rather than as a concept that drives inquiry. The purpose of this study was to develop a method for collecting data about information seeking context in natural online environments, and identify which aspects of context should be considered when studying online information seeking. The study is reported in two parts. In this, the second part, results and implications of this research are presented. Part 1 (Kelly, 2006) discussed previous literature on information seeking context and behavior, situated the current study within this literature, and described the naturalistic, longitudinal research design that was used to examine and measure the online information seeking context of seven users during a 14-week period. Results provide support for the value of the method in studying online information seeking context, the relative importance of various measures of context, how these measures change over time, and, finally, the relationship between these measures. In particular, results demonstrate significant differences in distributions of usefulness ratings according to task and topic.
-
Niemi, T.; Jämsen , J.: ¬A query language for discovering semantic associations, part I : approach and formal definition of query primitives (2007)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 1591) [ClassicSimilarity], result of:
0.06594159 = score(doc=1591,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 1591, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=1591)
0.25 = coord(1/4)
- Abstract
- In contemporary query languages, the user is responsible for navigation among semantically related data. Because of the huge amount of data and the complex structural relationships among data in modern applications, it is unrealistic to suppose that the user could know completely the content and structure of the available information. There are several query languages whose purpose is to facilitate navigation in unknown structures of databases. However, the background assumption of these languages is that the user knows how data are related to each other semantically in the structure at hand. So far only little attention has been paid to how unknown semantic associations among available data can be discovered. We address this problem in this article. A semantic association between two entities can be constructed if a sequence of relationships expressed explicitly in a database can be found that connects these entities to each other. This sequence may contain several other entities through which the original entities are connected to each other indirectly. We introduce an expressive and declarative query language for discovering semantic associations. Our query language is able, for example, to discover semantic associations between entities for which only some of the characteristics are known. Further, it integrates the manipulation of semantic associations with the manipulation of documents that may contain information on entities in semantic associations.
-
Li, D.; Kwong, C.-P.; Lee, D.L.: Unified linear subspace approach to semantic analysis (2009)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 308) [ClassicSimilarity], result of:
0.06594159 = score(doc=308,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 308, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=308)
0.25 = coord(1/4)
- Abstract
- The Basic Vector Space Model (BVSM) is well known in information retrieval. Unfortunately, its retrieval effectiveness is limited because it is based on literal term matching. The Generalized Vector Space Model (GVSM) and Latent Semantic Indexing (LSI) are two prominent semantic retrieval methods, both of which assume there is some underlying latent semantic structure in a dataset that can be used to improve retrieval performance. However, while this structure may be derived from both the term space and the document space, GVSM exploits only the former and LSI the latter. In this article, the latent semantic structure of a dataset is examined from a dual perspective; namely, we consider the term space and the document space simultaneously. This new viewpoint has a natural connection to the notion of kernels. Specifically, a unified kernel function can be derived for a class of vector space models. The dual perspective provides a deeper understanding of the semantic space and makes transparent the geometrical meaning of the unified kernel function. New semantic analysis methods based on the unified kernel function are developed, which combine the advantages of LSI and GVSM. We also prove that the new methods are stable because although the selected rank of the truncated Singular Value Decomposition (SVD) is far from the optimum, the retrieval performance will not be degraded significantly. Experiments performed on standard test collections show that our methods are promising.
-
Schaefer, A.; Jordan, M.; Klas, C.-P.; Fuhr, N.: Active support for query formulation in virtual digital libraries : a case study with DAFFODIL (2005)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 296) [ClassicSimilarity], result of:
0.06594159 = score(doc=296,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 296, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=296)
0.25 = coord(1/4)
- Abstract
- Daffodil is a front-end to federated, heterogeneous digital libraries targeting at strategic support of users during the information seeking process. This is done by offering a variety of functions for searching, exploring and managing digital library objects. However, the distributed search increases response time and the conceptual model of the underlying search processes is inherently weaker. This makes query formulation harder and the resulting waiting times can be frustrating. In this paper, we investigate the concept of proactive support during the user's query formulation. For improving user efficiency and satisfaction, we implemented annotations, proactive support and error markers on the query form itself. These functions decrease the probability for syntactical or semantical errors in queries. Furthermore, the user is able to make better tactical decisions and feels more confident that the system handles the query properly. Evaluations with 30 subjects showed that user satisfaction is improved, whereas no conclusive results were received for efficiency.
-
Kim, H.H.: Toward video semantic search based on a structured folksonomy (2011)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 350) [ClassicSimilarity], result of:
0.06594159 = score(doc=350,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 350, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=350)
0.25 = coord(1/4)
- Abstract
- This study investigated the effectiveness of query expansion using synonymous and co-occurrence tags in users' video searches as well as the effect of visual storyboard surrogates on users' relevance judgments when browsing videos. To do so, we designed a structured folksonomy-based system in which tag queries can be expanded via synonyms or co-occurrence words, based on the use of WordNet 2.1 synonyms and Flickr's related tags. To evaluate the structured folksonomy-based system, we conducted an experiment, the results of which suggest that the mean recall rate in the structured folksonomy-based system is statistically higher than that in a tag-based system without query expansion; however, the mean precision rate in the structured folksonomy-based system is not statistically higher than that in the tag-based system. Next, we compared the precision rates of the proposed system with storyboards (SB), in which SB and text metadata are shown to users when they browse video search results, with those of the proposed system without SB, in which only text metadata are shown. Our result showed that browsing only text surrogates-including tags without multimedia surrogates-is not sufficient for users' relevance judgments.
-
Xamena, E.; Brignole, N.B.; Maguitman, A.G.: ¬A study of relevance propagation in large topic ontologies (2013)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 2105) [ClassicSimilarity], result of:
0.06594159 = score(doc=2105,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 2105, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=2105)
0.25 = coord(1/4)
- Abstract
- Topic ontologies or web directories consist of large collections of links to websites, arranged by topic in different categories. The structure of these ontologies is typically not flat because there are hierarchical and nonhierarchical relationships among topics. As a consequence, websites classified under a certain topic may be relevant to other topics. Although some of these relevance relations are explicit, most of them must be discovered by an analysis of the structure of the ontologies. This article proposes a family of models of relevance propagation in topic ontologies. An efficient computational framework is described and used to compute nine different models for a portion of the Open Directory Project graph consisting of more than half a million nodes and approximately 1.5 million edges of different types. After performing a quantitative analysis, a user study was carried out to compare the most promising models. It was found that some general difficulties rule out the possibility of defining flawless models of relevance propagation that only take into account structural aspects of an ontology. However, there is a clear indication that including transitive relations induced by the nonhierarchical components of the ontology results in relevance propagation models that are superior to more basic approaches.
-
Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 2338) [ClassicSimilarity], result of:
0.06594159 = score(doc=2338,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 2338, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=2338)
0.25 = coord(1/4)
- Abstract
- A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
-
Pal, D.; Mitra, M.; Datta, K.: Improving query expansion using WordNet (2014)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 2545) [ClassicSimilarity], result of:
0.06594159 = score(doc=2545,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 2545, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=2545)
0.25 = coord(1/4)
- Abstract
- This study proposes a new way of using WordNet for query expansion (QE). We choose candidate expansion terms from a set of pseudo-relevant documents; however, the usefulness of these terms is measured based on their definitions provided in a hand-crafted lexical resource such as WordNet. Experiments with a number of standard TREC collections WordNet-based that this method outperforms existing WordNet-based methods. It also compares favorably with established QE methods such as KLD and RM3. Leveraging earlier work in which a combination of QE methods was found to outperform each individual method (as well as other well-known QE methods), we next propose a combination-based QE method that takes into account three different aspects of a candidate expansion term's usefulness: (a) its distribution in the pseudo-relevant documents and in the target corpus, (b) its statistical association with query terms, and (c) its semantic relation with the query, as determined by the overlap between the WordNet definitions of the term and query terms. This combination of diverse sources of information appears to work well on a number of test collections, viz., TREC123, TREC5, TREC678, TREC robust (new), and TREC910 collections, and yields significant improvements over competing methods on most of these collections.
-
Bando, L.L.; Scholer, F.; Turpin, A.: Query-biased summary generation assisted by query expansion : temporality (2015)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 2820) [ClassicSimilarity], result of:
0.06594159 = score(doc=2820,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 2820, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=2820)
0.25 = coord(1/4)
- Abstract
- Query-biased summaries help users to identify which items returned by a search system should be read in full. In this article, we study the generation of query-biased summaries as a sentence ranking approach, and methods to evaluate their effectiveness. Using sentence-level relevance assessments from the TREC Novelty track, we gauge the benefits of query expansion to minimize the vocabulary mismatch problem between informational requests and sentence ranking methods. Our results from an intrinsic evaluation show that query expansion significantly improves the selection of short relevant sentences (5-13 words) between 7% and 11%. However, query expansion does not lead to improvements for sentences of medium (14-20 words) and long (21-29 words) lengths. In a separate crowdsourcing study, we analyze whether a summary composed of sentences ranked using query expansion was preferred over summaries not assisted by query expansion, rather than assessing sentences individually. We found that participants chose summaries aided by query expansion around 60% of the time over summaries using an unexpanded query. We conclude that query expansion techniques can benefit the selection of sentences for the construction of query-biased summaries at the summary level rather than at the sentence ranking level.
-
Jiang, Y.; Zhang, X.; Tang, Y.; Nie, R.: Feature-based approaches to semantic similarity assessment of concepts using Wikipedia (2015)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 3682) [ClassicSimilarity], result of:
0.06594159 = score(doc=3682,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 3682, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=3682)
0.25 = coord(1/4)
- Abstract
- Semantic similarity assessment between concepts is an important task in many language related applications. In the past, several approaches to assess similarity by evaluating the knowledge modeled in an (or multiple) ontology (or ontologies) have been proposed. However, there are some limitations such as the facts of relying on predefined ontologies and fitting non-dynamic domains in the existing measures. Wikipedia provides a very large domain-independent encyclopedic repository and semantic network for computing semantic similarity of concepts with more coverage than usual ontologies. In this paper, we propose some novel feature based similarity assessment methods that are fully dependent on Wikipedia and can avoid most of the limitations and drawbacks introduced above. To implement similarity assessment based on feature by making use of Wikipedia, firstly a formal representation of Wikipedia concepts is presented. We then give a framework for feature based similarity based on the formal representation of Wikipedia concepts. Lastly, we investigate several feature based approaches to semantic similarity measures resulting from instantiations of the framework. The evaluation, based on several widely used benchmarks and a benchmark developed in ourselves, sustains the intuitions with respect to human judgements. Overall, several methods proposed in this paper have good human correlation and constitute some effective ways of determining similarity between Wikipedia concepts.
-
Bhansali, D.; Desai, H.; Deulkar, K.: ¬A study of different ranking approaches for semantic search (2015)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 3696) [ClassicSimilarity], result of:
0.06594159 = score(doc=3696,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 3696, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=3696)
0.25 = coord(1/4)
- Abstract
- Search Engines have become an integral part of our day to day life. Our reliance on search engines increases with every passing day. With the amount of data available on Internet increasing exponentially, it becomes important to develop new methods and tools that help to return results relevant to the queries and reduce the time spent on searching. The results should be diverse but at the same time should return results focused on the queries asked. Relation Based Page Rank [4] algorithms are considered to be the next frontier in improvement of Semantic Web Search. The probability of finding relevance in the search results as posited by the user while entering the query is used to measure the relevance. However, its application is limited by the complexity of determining relation between the terms and assigning explicit meaning to each term. Trust Rank is one of the most widely used ranking algorithms for semantic web search. Few other ranking algorithms like HITS algorithm, PageRank algorithm are also used for Semantic Web Searching. In this paper, we will provide a comparison of few ranking approaches.
-
Liu, X.; Zheng, W.; Fang, H.: ¬An exploration of ranking models and feedback method for related entity finding (2013)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 3714) [ClassicSimilarity], result of:
0.06594159 = score(doc=3714,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 3714, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=3714)
0.25 = coord(1/4)
- Abstract
- Most existing search engines focus on document retrieval. However, information needs are certainly not limited to finding relevant documents. Instead, a user may want to find relevant entities such as persons and organizations. In this paper, we study the problem of related entity finding. Our goal is to rank entities based on their relevance to a structured query, which specifies an input entity, the type of related entities and the relation between the input and related entities. We first discuss a general probabilistic framework, derive six possible retrieval models to rank the related entities, and then compare these models both analytically and empirically. To further improve performance, we study the problem of feedback in the context of related entity finding. Specifically, we propose a mixture model based feedback method that can utilize the pseudo feedback entities to estimate an enriched model for the relation between the input and related entities. Experimental results over two standard TREC collections show that the derived relation generation model combined with a relation feedback method performs better than other models.
-
Jiang, Y.; Bai, W.; Zhang, X.; Hu, J.: Wikipedia-based information content and semantic similarity computation (2017)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 3877) [ClassicSimilarity], result of:
0.06594159 = score(doc=3877,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 3877, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=3877)
0.25 = coord(1/4)
- Abstract
- The Information Content (IC) of a concept is a fundamental dimension in computational linguistics. It enables a better understanding of concept's semantics. In the past, several approaches to compute IC of a concept have been proposed. However, there are some limitations such as the facts of relying on corpora availability, manual tagging, or predefined ontologies and fitting non-dynamic domains in the existing methods. Wikipedia provides a very large domain-independent encyclopedic repository and semantic network for computing IC of concepts with more coverage than usual ontologies. In this paper, we propose some novel methods to IC computation of a concept to solve the shortcomings of existing approaches. The presented methods focus on the IC computation of a concept (i.e., Wikipedia category) drawn from the Wikipedia category structure. We propose several new IC-based measures to compute the semantic similarity between concepts. The evaluation, based on several widely used benchmarks and a benchmark developed in ourselves, sustains the intuitions with respect to human judgments. Overall, some methods proposed in this paper have a good human correlation and constitute some effective ways of determining IC values for concepts and semantic similarity between concepts.
-
Xu, B.; Lin, H.; Lin, Y.: Assessment of learning to rank methods for query expansion (2016)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 3929) [ClassicSimilarity], result of:
0.06594159 = score(doc=3929,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 3929, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=3929)
0.25 = coord(1/4)
- Abstract
- Pseudo relevance feedback, as an effective query expansion method, can significantly improve information retrieval performance. However, the method may negatively impact the retrieval performance when some irrelevant terms are used in the expanded query. Therefore, it is necessary to refine the expansion terms. Learning to rank methods have proven effective in information retrieval to solve ranking problems by ranking the most relevant documents at the top of the returned list, but few attempts have been made to employ learning to rank methods for term refinement in pseudo relevance feedback. This article proposes a novel framework to explore the feasibility of using learning to rank to optimize pseudo relevance feedback by means of reranking the candidate expansion terms. We investigate some learning approaches to choose the candidate terms and introduce some state-of-the-art learning to rank methods to refine the expansion terms. In addition, we propose two term labeling strategies and examine the usefulness of various term features to optimize the framework. Experimental results with three TREC collections show that our framework can effectively improve retrieval performance.
-
Athukorala, K.; Glowacka, D.; Jacucci, G.; Oulasvirta, A.; Vreeken, J.: Is exploratory search different? : a comparison of information search behavior for exploratory and lookup tasks (2016)
0.02
0.016485397 = product of:
0.06594159 = sum of:
0.06594159 = weight(_text_:however in 4150) [ClassicSimilarity], result of:
0.06594159 = score(doc=4150,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.22941813 = fieldWeight in 4150, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=4150)
0.25 = coord(1/4)
- Abstract
- Exploratory search is an increasingly important activity yet challenging for users. Although there exists an ample amount of research into understanding exploration, most of the major information retrieval (IR) systems do not provide tailored and adaptive support for such tasks. One reason is the lack of empirical knowledge on how to distinguish exploratory and lookup search behaviors in IR systems. The goal of this article is to investigate how to separate the 2 types of tasks in an IR system using easily measurable behaviors. In this article, we first review characteristics of exploratory search behavior. We then report on a controlled study of 6 search tasks with 3 exploratory-comparison, knowledge acquisition, planning-and 3 lookup tasks-fact-finding, navigational, question answering. The results are encouraging, showing that IR systems can distinguish the 2 search categories in the course of a search session. The most distinctive indicators that characterize exploratory search behaviors are query length, maximum scroll depth, and task completion time. However, 2 tasks are borderline and exhibit mixed characteristics. We assess the applicability of this finding by reporting on several classification experiments. Our results have valuable implications for designing tailored and adaptive IR systems.