Search (91 results, page 2 of 5)

  • × theme_ss:"Retrievalalgorithmen"
  1. Mandl, T.: Web- und Multimedia-Dokumente : Neuere Entwicklungen bei der Evaluierung von Information Retrieval Systemen (2003) 0.02
    0.01842779 = product of:
      0.07371116 = sum of:
        0.07371116 = weight(_text_:und in 2734) [ClassicSimilarity], result of:
          0.07371116 = score(doc=2734,freq=12.0), product of:
            0.15350439 = queryWeight, product of:
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.06921162 = queryNorm
            0.48018923 = fieldWeight in 2734, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.0625 = fieldNorm(doc=2734)
      0.25 = coord(1/4)
    
    Abstract
    Die Menge an Daten im Internet steigt weiter rapide an. Damit wächst auch der Bedarf an qualitativ hochwertigen Information Retrieval Diensten zur Orientierung und problemorientierten Suche. Die Entscheidung für die Benutzung oder Beschaffung von Information Retrieval Software erfordert aussagekräftige Evaluierungsergebnisse. Dieser Beitrag stellt neuere Entwicklungen bei der Evaluierung von Information Retrieval Systemen vor und zeigt den Trend zu Spezialisierung und Diversifizierung von Evaluierungsstudien, die den Realitätsgrad derErgebnisse erhöhen. DerSchwerpunkt liegt auf dem Retrieval von Fachtexten, Internet-Seiten und Multimedia-Objekten.
    Source
    Information - Wissenschaft und Praxis. 54(2003) H.4, S.203-210
  2. Nagelschmidt, M.: Verfahren zur Anfragemodifikation im Information Retrieval (2008) 0.02
    0.017842628 = product of:
      0.07137051 = sum of:
        0.07137051 = weight(_text_:und in 3774) [ClassicSimilarity], result of:
          0.07137051 = score(doc=3774,freq=20.0), product of:
            0.15350439 = queryWeight, product of:
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.06921162 = queryNorm
            0.4649412 = fieldWeight in 3774, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.046875 = fieldNorm(doc=3774)
      0.25 = coord(1/4)
    
    Abstract
    Für das Modifizieren von Suchanfragen kennt das Information Retrieval vielfältige Möglichkeiten. Nach einer einleitenden Darstellung der Wechselwirkung zwischen Informationsbedarf und Suchanfrage wird eine konzeptuelle und typologische Annäherung an Verfahren zur Anfragemodifikation gegeben. Im Anschluss an eine kurze Charakterisierung des Fakten- und des Information Retrieval, sowie des Vektorraum- und des probabilistischen Modells, werden intellektuelle, automatische und interaktive Modifikationsverfahren vorgestellt. Neben klassischen intellektuellen Verfahren, wie der Blockstrategie und der "Citation Pearl Growing"-Strategie, umfasst die Darstellung der automatischen und interaktiven Verfahren Modifikationsmöglichkeiten auf den Ebenen der Morphologie, der Syntax und der Semantik von Suchtermen. Darüber hinaus werden das Relevance Feedback, der Nutzen informetrischer Analysen und die Idee eines assoziativen Retrievals auf der Basis von Clustering- und terminologischen Techniken, sowie zitationsanalytischen Verfahren verfolgt. Ein Eindruck für die praktischen Gestaltungsmöglichkeiten der behandelten Verfahren soll abschließend durch fünf Anwendungsbeispiele vermittelt werden.
  3. Fuhr, N.: Zur Überwindung der Diskrepanz zwischen Retrievalforschung und -praxis (1990) 0.02
    0.016822193 = product of:
      0.06728877 = sum of:
        0.06728877 = weight(_text_:und in 6624) [ClassicSimilarity], result of:
          0.06728877 = score(doc=6624,freq=10.0), product of:
            0.15350439 = queryWeight, product of:
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.06921162 = queryNorm
            0.4383508 = fieldWeight in 6624, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.0625 = fieldNorm(doc=6624)
      0.25 = coord(1/4)
    
    Abstract
    In diesem Beitrag werden einige Forschungsergebnisse des Information Retrieval vorgestellt, die unmittelbar zur Verbesserung der Retrievalqualität für bereits existierende Datenbanken eingesetzt werden können: Linguistische Algorithmen zur Grund- und Stammformreduktion unterstützen die Suche nach Flexions- und Derivationsformen von Suchtermen. Rankingalgorithmen, die Frage- und Dokumentterme gewichten, führen zu signifikant besseren Retrievalergebnissen als beim Booleschen Retrieval. Durch Relevance Feedback können die Retrievalqualität weiter gesteigert und außerdem der Benutzer bei der sukzessiven Modifikation seiner Frageformulierung unterstützt werden. Es wird eine benutzerfreundliche Bedienungsoberfläche für ein System vorgestellt, das auf diesen Konzepten basiert.
  4. Tober, M.; Hennig, L.; Furch, D.: SEO Ranking-Faktoren und Rang-Korrelationen 2014 : Google Deutschland (2014) 0.02
    0.016822193 = product of:
      0.06728877 = sum of:
        0.06728877 = weight(_text_:und in 2484) [ClassicSimilarity], result of:
          0.06728877 = score(doc=2484,freq=10.0), product of:
            0.15350439 = queryWeight, product of:
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.06921162 = queryNorm
            0.4383508 = fieldWeight in 2484, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.0625 = fieldNorm(doc=2484)
      0.25 = coord(1/4)
    
    Abstract
    Dieses Whitepaper beschäftigt sich mit der Definition und Bewertung von Faktoren, die eine hohe Rangkorrelation-Koeffizienz mit organischen Suchergebnissen aufweisen und dient dem Zweck der tieferen Analyse von Suchmaschinen-Algorithmen. Die Datenerhebung samt Auswertung bezieht sich auf Ranking-Faktoren für Google-Deutschland im Jahr 2014. Zusätzlich wurden die Korrelationen und Faktoren unter anderem anhand von Durchschnitts- und Medianwerten sowie Entwicklungstendenzen zu den Vorjahren hinsichtlich ihrer Relevanz für vordere Suchergebnis-Positionen interpretiert.
  5. Behnert, C.; Borst, T.: Neue Formen der Relevanz-Sortierung in bibliothekarischen Informationssystemen : das DFG-Projekt LibRank (2015) 0.02
    0.016822193 = product of:
      0.06728877 = sum of:
        0.06728877 = weight(_text_:und in 392) [ClassicSimilarity], result of:
          0.06728877 = score(doc=392,freq=10.0), product of:
            0.15350439 = queryWeight, product of:
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.06921162 = queryNorm
            0.4383508 = fieldWeight in 392, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.0625 = fieldNorm(doc=392)
      0.25 = coord(1/4)
    
    Abstract
    Das von der DFG geförderte Projekt LibRank erforscht neue Rankingverfahren für bibliothekarische Informationssysteme, die aufbauend auf Erkenntnissen aus dem Bereich Websuche qualitätsinduzierende Faktoren wie z. B. Aktualität, Popularität und Verfügbarkeit von einzelnen Medien berücksichtigen. Die konzipierten Verfahren werden im Kontext eines in den Wirtschaftswissenschaften häufig genutzten Rechercheportals (EconBiz) entwickelt und in einem Testsystem systematisch evaluiert. Es werden Rankingfaktoren, die für den Bibliotheksbereich von besonderem Interesse sind, vorgestellt und exemplarisch Probleme und Herausforderungen aufgezeigt.
    Source
    Bibliothek: Forschung und Praxis. 39(2015) H.3, S.384-393
  6. Keen, E.M.: Designing and testing an interactive ranked retrieval system for professional searchers (1994) 0.02
    0.016485397 = product of:
      0.06594159 = sum of:
        0.06594159 = weight(_text_:however in 1134) [ClassicSimilarity], result of:
          0.06594159 = score(doc=1134,freq=2.0), product of:
            0.28742972 = queryWeight, product of:
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.06921162 = queryNorm
            0.22941813 = fieldWeight in 1134, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.0390625 = fieldNorm(doc=1134)
      0.25 = coord(1/4)
    
    Abstract
    Reports 3 explorations of ranked system design. 2 tests used a 'cystic fibrosis' test collection with 100 queries. Experiment 1 compared a Boolean with a ranked interactive system using a subject qualified trained searcher, and reporting recall and precision results. Experiment 2 compared 15 different ranked match algorithms in a batch mode using 2 test collections, and included some new proximate pairs and term weighting approaches. Experiment 3 is a design plan for an interactive ranked prototype offering mid search algorithm choices plus other manual search devices (such as obligatory and unwanted terms), as influenced by thinking aloud comments from experiment 1. Concludes that, in Boolean versus ranked using inverse collection frequency, the searcher inspected more records on ranked than Boolean and so achieved a higher recall but lower precision; however, the presentation order of the relevant records, was, on average, very similar in both systems. Concludes also that: query reformulation was quite strongly practised in ranked searching but does not appear to have been effective; the term pairs proximate weithing methods in experiment 2 enhanced precision on both test collections when used with inverse collection frequency weighting (ICF); and the design plan for an interactive prototype adds to a selection of match algorithms other devices, such as obligatory and unwanted term marking, evidence for this being found from think aloud comments
  7. Efthimiadis, E.N.: User choices : a new yardstick for the evaluation of ranking algorithms for interactive query expansion (1995) 0.02
    0.016485397 = product of:
      0.06594159 = sum of:
        0.06594159 = weight(_text_:however in 6697) [ClassicSimilarity], result of:
          0.06594159 = score(doc=6697,freq=2.0), product of:
            0.28742972 = queryWeight, product of:
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.06921162 = queryNorm
            0.22941813 = fieldWeight in 6697, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.0390625 = fieldNorm(doc=6697)
      0.25 = coord(1/4)
    
    Abstract
    The performance of 8 ranking algorithms was evaluated with respect to their effectiveness in ranking terms for query expansion. The evaluation was conducted within an investigation of interactive query expansion and relevance feedback in a real operational environment. Focuses on the identification of algorithms that most effectively take cognizance of user preferences. user choices (i.e. the terms selected by the searchers for the query expansion search) provided the yardstick for the evaluation of the 8 ranking algorithms. This methodology introduces a user oriented approach in evaluating ranking algorithms for query expansion in contrast to the standard, system oriented approaches. Similarities in the performance of the 8 algorithms and the ways these algorithms rank terms were the main focus of this evaluation. The findings demonstrate that the r-lohi, wpq, enim, and porter algorithms have similar performance in bringing good terms to the top of a ranked list of terms for query expansion. However, further evaluation of the algorithms in different (e.g. full text) environments is needed before these results can be generalized beyond the context of the present study
  8. Picard, J.; Savoy, J.: Enhancing retrieval with hyperlinks : a general model based on propositional argumentation systems (2003) 0.02
    0.016485397 = product of:
      0.06594159 = sum of:
        0.06594159 = weight(_text_:however in 2427) [ClassicSimilarity], result of:
          0.06594159 = score(doc=2427,freq=2.0), product of:
            0.28742972 = queryWeight, product of:
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.06921162 = queryNorm
            0.22941813 = fieldWeight in 2427, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.0390625 = fieldNorm(doc=2427)
      0.25 = coord(1/4)
    
    Abstract
    Fast, effective, and adaptable techniques are needed to automatically organize and retrieve information an the ever-increasing World Wide Web. In that respect, different strategies have been suggested to take hypertext links into account. For example, hyperlinks have been used to (1) enhance document representation, (2) improve document ranking by propagating document score, (3) provide an indicator of popularity, and (4) find hubs and authorities for a given topic. Although the TREC experiments have not demonstrated the usefulness of hyperlinks for retrieval, the hypertext structure is nevertheless an essential aspect of the Web, and as such, should not be ignored. The development of abstract models of the IR task was a key factor to the improvement of search engines. However, at this time conceptual tools for modeling the hypertext retrieval task are lacking, making it difficult to compare, improve, and reason an the existing techniques. This article proposes a general model for using hyperlinks based an Probabilistic Argumentation Systems, in which each of the above-mentioned techniques can be stated. This model will allow to discover some inconsistencies in the mentioned techniques, and to take a higher level and systematic approach for using hyperlinks for retrieval.
  9. Kang, I.-H.; Kim, G.C.: Integration of multiple evidences based on a query type for web search (2004) 0.02
    0.016485397 = product of:
      0.06594159 = sum of:
        0.06594159 = weight(_text_:however in 3568) [ClassicSimilarity], result of:
          0.06594159 = score(doc=3568,freq=2.0), product of:
            0.28742972 = queryWeight, product of:
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.06921162 = queryNorm
            0.22941813 = fieldWeight in 3568, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.0390625 = fieldNorm(doc=3568)
      0.25 = coord(1/4)
    
    Abstract
    The massive and heterogeneous Web exacerbates IR problems and short user queries make them worse. The contents of web pages are not enough to find answer pages. PageRank compensates for the insufficiencies of content information. The content information and PageRank are combined to get better results. However, static combination of multiple evidences may lower the retrieval performance. We have to use different strategies to meet the need of a user. We can classify user queries as three categories according to users' intent, the topic relevance task, the homepage finding task, and the service finding task. In this paper, we present a user query classification method. The difference of distribution, mutual information, the usage rate as anchor texts and the POS information are used for the classification. After we classified a user query, we apply different algorithms and information for the better results. For the topic relevance task, we emphasize the content information, on the other hand, for the homepage finding task, we emphasize the Link information and the URL information. We could get the best performance when our proposed classification method with the OKAPI scoring algorithm was used.
  10. Ruthven, T.; Lalmas, M.; Rijsbergen, K.van: Incorporating user research behavior into relevance feedback (2003) 0.02
    0.016485397 = product of:
      0.06594159 = sum of:
        0.06594159 = weight(_text_:however in 169) [ClassicSimilarity], result of:
          0.06594159 = score(doc=169,freq=2.0), product of:
            0.28742972 = queryWeight, product of:
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.06921162 = queryNorm
            0.22941813 = fieldWeight in 169, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.0390625 = fieldNorm(doc=169)
      0.25 = coord(1/4)
    
    Abstract
    Ruthven, Mounia, and van Rijsbergen rank and select terms for query expansion using information gathered on searcher evaluation behavior. Using the TREC Financial Times and Los Angeles Times collections and search topics from TREC-6 placed in simulated work situations, six student subjects each preformed three searches on an experimental system and three on a control system with instructions to search by natural language expression in any way they found comfortable. Searching was analyzed for behavior differences between experimental and control situations, and for effectiveness and perceptions. In three experiments paired t-tests were the analysis tool with controls being a no relevance feedback system, a standard ranking for automatic expansion system, and a standard ranking for interactive expansion while the experimental systems based ranking upon user information on temporal relevance and partial relevance. Two further experiments compare using user behavior (number assessed relevant and similarity of relevant documents) to choose a query expansion technique against a non-selective technique and finally the effect of providing the user with knowledge of the process. When partial relevance data and time of assessment data are incorporated in term ranking more relevant documents were recovered in fewer iterations, however retrieval effectiveness overall was not improved. The subjects, none-the-less, rated the suggested terms as more useful and used them more heavily. Explanations of what the feedback techniques were doing led to higher use of the techniques.
  11. Lempel, R.; Moran, S.: SALSA: the stochastic approach for link-structure analysis (2001) 0.02
    0.016485397 = product of:
      0.06594159 = sum of:
        0.06594159 = weight(_text_:however in 1010) [ClassicSimilarity], result of:
          0.06594159 = score(doc=1010,freq=2.0), product of:
            0.28742972 = queryWeight, product of:
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.06921162 = queryNorm
            0.22941813 = fieldWeight in 1010, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.0390625 = fieldNorm(doc=1010)
      0.25 = coord(1/4)
    
    Abstract
    Today, when searching for information on the WWW, one usually performs a query through a term-based search engine. These engines return, as the query's result, a list of Web pages whose contents matches the query. For broad-topic queries, such searches often result in a huge set of retrieved documents, many of which are irrelevant to the user. However, much information is contained in the link-structure of the WWW. Information such as which pages are linked to others can be used to augment search algorithms. In this context, Jon Kleinberg introduced the notion of two distinct types of Web pages: hubs and authorities. Kleinberg argued that hubs and authorities exhibit a mutually reinforcing relationship: a good hub will point to many authorities, and a good authority will be pointed at by many hubs. In light of this, he dervised an algoirthm aimed at finding authoritative pages. We present SALSA, a new stochastic approach for link-structure analysis, which examines random walks on graphs derived from the link-structure. We show that both SALSA and Kleinberg's Mutual Reinforcement approach employ the same metaalgorithm. We then prove that SALSA is quivalent to a weighted in degree analysis of the link-sturcutre of WWW subgraphs, making it computationally more efficient than the Mutual reinforcement approach. We compare that results of applying SALSA to the results derived through Kleinberg's approach. These comparisions reveal a topological Phenomenon called the TKC effectwhich, in certain cases, prevents the Mutual reinforcement approach from identifying meaningful authorities.
  12. MacFarlane, A.; McCann, J.A.; Robertson, S.E.: Parallel methods for the update of partitioned inverted files (2007) 0.02
    0.016485397 = product of:
      0.06594159 = sum of:
        0.06594159 = weight(_text_:however in 1819) [ClassicSimilarity], result of:
          0.06594159 = score(doc=1819,freq=2.0), product of:
            0.28742972 = queryWeight, product of:
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.06921162 = queryNorm
            0.22941813 = fieldWeight in 1819, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.0390625 = fieldNorm(doc=1819)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - An issue that tends to be ignored in information retrieval is the issue of updating inverted files. This is largely because inverted files were devised to provide fast query service, and much work has been done with the emphasis strongly on queries. This paper aims to study the effect of using parallel methods for the update of inverted files in order to reduce costs, by looking at two types of partitioning for inverted files: document identifier and term identifier. Design/methodology/approach - Raw update service and update with query service are studied with these partitioning schemes using an incremental update strategy. The paper uses standard measures used in parallel computing such as speedup to examine the computing results and also the costs of reorganising indexes while servicing transactions. Findings - Empirical results show that for both transaction processing and index reorganisation the document identifier method is superior. However, there is evidence that the term identifier partitioning method could be useful in a concurrent transaction processing context. Practical implications - There is an increasing need to service updates, which is now becoming a requirement of inverted files (for dynamic collections such as the web), demonstrating that a shift in requirements of inverted file maintenance is needed from the past. Originality/value - The paper is of value to database administrators who manage large-scale and dynamic text collections, and who need to use parallel computing to implement their text retrieval services.
  13. Cheng, C.-S.; Chung, C.-P.; Shann, J.J.-J.: Fast query evaluation through document identifier assignment for inverted file-based information retrieval systems (2006) 0.02
    0.016485397 = product of:
      0.06594159 = sum of:
        0.06594159 = weight(_text_:however in 1979) [ClassicSimilarity], result of:
          0.06594159 = score(doc=1979,freq=2.0), product of:
            0.28742972 = queryWeight, product of:
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.06921162 = queryNorm
            0.22941813 = fieldWeight in 1979, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.0390625 = fieldNorm(doc=1979)
      0.25 = coord(1/4)
    
    Abstract
    Compressing an inverted file can greatly improve query performance of an information retrieval system (IRS) by reducing disk I/Os. We observe that a good document identifier assignment (DIA) can make the document identifiers in the posting lists more clustered, and result in better compression as well as shorter query processing time. In this paper, we tackle the NP-complete problem of finding an optimal DIA to minimize the average query processing time in an IRS when the probability distribution of query terms is given. We indicate that the greedy nearest neighbor (Greedy-NN) algorithm can provide excellent performance for this problem. However, the Greedy-NN algorithm is inappropriate if used in large-scale IRSs, due to its high complexity O(N2 × n), where N denotes the number of documents and n denotes the number of distinct terms. In real-world IRSs, the distribution of query terms is skewed. Based on this fact, we propose a fast O(N × n) heuristic, called partition-based document identifier assignment (PBDIA) algorithm, which can efficiently assign consecutive document identifiers to those documents containing frequently used query terms, and improve compression efficiency of the posting lists for those terms. This can result in reduced query processing time. The experimental results show that the PBDIA algorithm can yield a competitive performance versus the Greedy-NN for the DIA problem, and that this optimization problem has significant advantages for both long queries and parallel information retrieval (IR).
  14. Lewandowski, D.: How can library materials be ranked in the OPAC? (2009) 0.02
    0.016485397 = product of:
      0.06594159 = sum of:
        0.06594159 = weight(_text_:however in 3810) [ClassicSimilarity], result of:
          0.06594159 = score(doc=3810,freq=2.0), product of:
            0.28742972 = queryWeight, product of:
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.06921162 = queryNorm
            0.22941813 = fieldWeight in 3810, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.0390625 = fieldNorm(doc=3810)
      0.25 = coord(1/4)
    
    Abstract
    Some Online Public Access Catalogues offer a ranking component. However, ranking there is merely text-based and is doomed to fail due to limited text in bibliographic data. The main assumption for the talk is that we are in a situation where the appropriate ranking factors for OPACs should be defined, while the implementation is no major problem. We must define what we want, and not so much focus on the technical work. Some deep thinking is necessary on the "perfect results set" and how we can achieve it through ranking. The talk presents a set of potential ranking factors and clustering possibilities for further discussion. A look at commercial Web search engines could provide us with ideas how ranking can be improved with additional factors. Search engines are way beyond pure text-based ranking and apply ranking factors in the groups like popularity, freshness, personalisation, etc. The talk describes the main factors used in search engines and how derivatives of these could be used for libraries' purposes. The goal of ranking is to provide the user with the best-suitable results on top of the results list. How can this goal be achieved with the library catalogue and also concerning the library's different collections and databases? The assumption is that ranking of such materials is a complex problem and is yet nowhere near solved. Libraries should focus on ranking to improve user experience.
  15. Wei, F.; Li, W.; Lu, Q.; He, Y.: Applying two-level reinforcement ranking in query-oriented multidocument summarization (2009) 0.02
    0.016485397 = product of:
      0.06594159 = sum of:
        0.06594159 = weight(_text_:however in 107) [ClassicSimilarity], result of:
          0.06594159 = score(doc=107,freq=2.0), product of:
            0.28742972 = queryWeight, product of:
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.06921162 = queryNorm
            0.22941813 = fieldWeight in 107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.0390625 = fieldNorm(doc=107)
      0.25 = coord(1/4)
    
    Abstract
    Sentence ranking is the issue of most concern in document summarization today. While traditional feature-based approaches evaluate sentence significance and rank the sentences relying on the features that are particularly designed to characterize the different aspects of the individual sentences, the newly emerging graph-based ranking algorithms (such as the PageRank-like algorithms) recursively compute sentence significance using the global information in a text graph that links sentences together. In general, the existing PageRank-like algorithms can model well the phenomena that a sentence is important if it is linked by many other important sentences. Or they are capable of modeling the mutual reinforcement among the sentences in the text graph. However, when dealing with multidocument summarization these algorithms often assemble a set of documents into one large file. The document dimension is totally ignored. In this article we present a framework to model the two-level mutual reinforcement among sentences as well as documents. Under this framework we design and develop a novel ranking algorithm such that the document reinforcement is taken into account in the process of sentence ranking. The convergence issue is examined. We also explore an interesting and important property of the proposed algorithm. When evaluated on the DUC 2005 and 2006 query-oriented multidocument summarization datasets, significant results are achieved.
  16. Lavrenko, V.: ¬A generative theory of relevance (2009) 0.02
    0.016485397 = product of:
      0.06594159 = sum of:
        0.06594159 = weight(_text_:however in 293) [ClassicSimilarity], result of:
          0.06594159 = score(doc=293,freq=2.0), product of:
            0.28742972 = queryWeight, product of:
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.06921162 = queryNorm
            0.22941813 = fieldWeight in 293, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.0390625 = fieldNorm(doc=293)
      0.25 = coord(1/4)
    
    Abstract
    A modern information retrieval system must have the capability to find, organize and present very different manifestations of information - such as text, pictures, videos or database records - any of which may be of relevance to the user. However, the concept of relevance, while seemingly intuitive, is actually hard to define, and it's even harder to model in a formal way. Lavrenko does not attempt to bring forth a new definition of relevance, nor provide arguments as to why any particular definition might be theoretically superior or more complete. Instead, he takes a widely accepted, albeit somewhat conservative definition, makes several assumptions, and from them develops a new probabilistic model that explicitly captures that notion of relevance. With this book, he makes two major contributions to the field of information retrieval: first, a new way to look at topical relevance, complementing the two dominant models, i.e., the classical probabilistic model and the language modeling approach, and which explicitly combines documents, queries, and relevance in a single formalism; second, a new method for modeling exchangeable sequences of discrete random variables which does not make any structural assumptions about the data and which can also handle rare events. Thus his book is of major interest to researchers and graduate students in information retrieval who specialize in relevance modeling, ranking algorithms, and language modeling.
  17. Schaefer, A.; Jordan, M.; Klas, C.-P.; Fuhr, N.: Active support for query formulation in virtual digital libraries : a case study with DAFFODIL (2005) 0.02
    0.016485397 = product of:
      0.06594159 = sum of:
        0.06594159 = weight(_text_:however in 296) [ClassicSimilarity], result of:
          0.06594159 = score(doc=296,freq=2.0), product of:
            0.28742972 = queryWeight, product of:
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.06921162 = queryNorm
            0.22941813 = fieldWeight in 296, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.0390625 = fieldNorm(doc=296)
      0.25 = coord(1/4)
    
    Abstract
    Daffodil is a front-end to federated, heterogeneous digital libraries targeting at strategic support of users during the information seeking process. This is done by offering a variety of functions for searching, exploring and managing digital library objects. However, the distributed search increases response time and the conceptual model of the underlying search processes is inherently weaker. This makes query formulation harder and the resulting waiting times can be frustrating. In this paper, we investigate the concept of proactive support during the user's query formulation. For improving user efficiency and satisfaction, we implemented annotations, proactive support and error markers on the query form itself. These functions decrease the probability for syntactical or semantical errors in queries. Furthermore, the user is able to make better tactical decisions and feels more confident that the system handles the query properly. Evaluations with 30 subjects showed that user satisfaction is improved, whereas no conclusive results were received for efficiency.
  18. Liu, R.-L.; Huang, Y.-C.: Ranker enhancement for proximity-based ranking of biomedical texts (2011) 0.02
    0.016485397 = product of:
      0.06594159 = sum of:
        0.06594159 = weight(_text_:however in 947) [ClassicSimilarity], result of:
          0.06594159 = score(doc=947,freq=2.0), product of:
            0.28742972 = queryWeight, product of:
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.06921162 = queryNorm
            0.22941813 = fieldWeight in 947, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.0390625 = fieldNorm(doc=947)
      0.25 = coord(1/4)
    
    Abstract
    Biomedical decision making often requires relevant evidence from the biomedical literature. Retrieval of the evidence calls for a system that receives a natural language query for a biomedical information need and, among the huge amount of texts retrieved for the query, ranks relevant texts higher for further processing. However, state-of-the-art text rankers have weaknesses in dealing with biomedical queries, which often consist of several correlating concepts and prefer those texts that completely talk about the concepts. In this article, we present a technique, Proximity-Based Ranker Enhancer (PRE), to enhance text rankers by term-proximity information. PRE assesses the term frequency (TF) of each term in the text by integrating three types of term proximity to measure the contextual completeness of query terms appearing in nearby areas in the text being ranked. Therefore, PRE may serve as a preprocessor for (or supplement to) those rankers that consider TF in ranking, without the need to change the algorithms and development processes of the rankers. Empirical evaluation shows that PRE significantly improves various kinds of text rankers, and when compared with several state-of-the-art techniques that enhance rankers by term-proximity information, PRE may more stably and significantly enhance the rankers.
  19. Dang, E.K.F.; Luk, R.W.P.; Allan, J.: Beyond bag-of-words : bigram-enhanced context-dependent term weights (2014) 0.02
    0.016485397 = product of:
      0.06594159 = sum of:
        0.06594159 = weight(_text_:however in 2283) [ClassicSimilarity], result of:
          0.06594159 = score(doc=2283,freq=2.0), product of:
            0.28742972 = queryWeight, product of:
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.06921162 = queryNorm
            0.22941813 = fieldWeight in 2283, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.0390625 = fieldNorm(doc=2283)
      0.25 = coord(1/4)
    
    Abstract
    While term independence is a widely held assumption in most of the established information retrieval approaches, it is clearly not true and various works in the past have investigated a relaxation of the assumption. One approach is to use n-grams in document representation instead of unigrams. However, the majority of early works on n-grams obtained only modest performance improvement. On the other hand, the use of information based on supporting terms or "contexts" of queries has been found to be promising. In particular, recent studies showed that using new context-dependent term weights improved the performance of relevance feedback (RF) retrieval compared with using traditional bag-of-words BM25 term weights. Calculation of the new term weights requires an estimation of the local probability of relevance of each query term occurrence. In previous studies, the estimation of this probability was based on unigrams that occur in the neighborhood of a query term. We explore an integration of the n-gram and context approaches by computing context-dependent term weights based on a mixture of unigrams and bigrams. Extensive experiments are performed using the title queries of the Text Retrieval Conference (TREC)-6, TREC-7, TREC-8, and TREC-2005 collections, for RF with relevance judgment of either the top 10 or top 20 documents of an initial retrieval. We identify some crucial elements needed in the use of bigrams in our methods, such as proper inverse document frequency (IDF) weighting of the bigrams and noise reduction by pruning bigrams with large document frequency values. We show that enhancing context-dependent term weights with bigrams is effective in further improving retrieval performance.
  20. Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.02
    0.016485397 = product of:
      0.06594159 = sum of:
        0.06594159 = weight(_text_:however in 2338) [ClassicSimilarity], result of:
          0.06594159 = score(doc=2338,freq=2.0), product of:
            0.28742972 = queryWeight, product of:
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.06921162 = queryNorm
            0.22941813 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1529117 = idf(docFreq=1897, maxDocs=44421)
              0.0390625 = fieldNorm(doc=2338)
      0.25 = coord(1/4)
    
    Abstract
    A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.

Years

Languages

  • e 54
  • d 36
  • m 1
  • More… Less…

Types

  • a 75
  • x 7
  • el 4
  • m 4
  • r 2
  • p 1
  • s 1
  • More… Less…