-
Bizer, C.; Heath, T.: Linked Data : evolving the web into a global data space (2011)
0.06
0.05997234 = product of:
0.23988935 = sum of:
0.23988935 = weight(_text_:hyperlink in 725) [ClassicSimilarity], result of:
0.23988935 = score(doc=725,freq=4.0), product of:
0.49147287 = queryWeight, product of:
7.809647 = idf(docFreq=48, maxDocs=44421)
0.06293151 = queryNorm
0.48810294 = fieldWeight in 725, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
7.809647 = idf(docFreq=48, maxDocs=44421)
0.03125 = fieldNorm(doc=725)
0.25 = coord(1/4)
- RSWK
- Semantic Web / Forschungsergebnis / Forschung / Daten / Hyperlink
- Subject
- Semantic Web / Forschungsergebnis / Forschung / Daten / Hyperlink
-
Brin, S.; Page, L.: ¬The anatomy of a large-scale hypertextual Web search engine (1998)
0.05
0.05300856 = product of:
0.21203424 = sum of:
0.21203424 = weight(_text_:hyperlink in 1947) [ClassicSimilarity], result of:
0.21203424 = score(doc=1947,freq=2.0), product of:
0.49147287 = queryWeight, product of:
7.809647 = idf(docFreq=48, maxDocs=44421)
0.06293151 = queryNorm
0.43142614 = fieldWeight in 1947, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.809647 = idf(docFreq=48, maxDocs=44421)
0.0390625 = fieldNorm(doc=1947)
0.25 = coord(1/4)
- Abstract
- In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want
-
Calado, P.; Cristo, M.; Gonçalves, M.A.; Moura, E.S. de; Ribeiro-Neto, B.; Ziviani, N.: Link-based similarity measures for the classification of Web documents (2006)
0.05
0.05300856 = product of:
0.21203424 = sum of:
0.21203424 = weight(_text_:hyperlink in 5921) [ClassicSimilarity], result of:
0.21203424 = score(doc=5921,freq=2.0), product of:
0.49147287 = queryWeight, product of:
7.809647 = idf(docFreq=48, maxDocs=44421)
0.06293151 = queryNorm
0.43142614 = fieldWeight in 5921, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.809647 = idf(docFreq=48, maxDocs=44421)
0.0390625 = fieldNorm(doc=5921)
0.25 = coord(1/4)
- Abstract
- Traditional text-based document classifiers tend to perform poorly an the Web. Text in Web documents is usually noisy and often does not contain enough information to determine their topic. However, the Web provides a different source that can be useful to document classification: its hyperlink structure. In this work, the authors evaluate how the link structure of the Web can be used to determine a measure of similarity appropriate for document classification. They experiment with five different similarity measures and determine their adequacy for predicting the topic of a Web page. Tests performed an a Web directory Show that link information alone allows classifying documents with an average precision of 86%. Further, when combined with a traditional textbased classifier, precision increases to values of up to 90%, representing gains that range from 63 to 132% over the use of text-based classification alone. Because the measures proposed in this article are straightforward to compute, they provide a practical and effective solution for Web classification and related information retrieval tasks. Further, the authors provide an important set of guidelines an how link structure can be used effectively to classify Web documents.
-
Zuccala, A.: Author cocitation analysis is to intellectual structure as Web colink analysis is to ... ? (2006)
0.05
0.05300856 = product of:
0.21203424 = sum of:
0.21203424 = weight(_text_:hyperlink in 8) [ClassicSimilarity], result of:
0.21203424 = score(doc=8,freq=2.0), product of:
0.49147287 = queryWeight, product of:
7.809647 = idf(docFreq=48, maxDocs=44421)
0.06293151 = queryNorm
0.43142614 = fieldWeight in 8, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.809647 = idf(docFreq=48, maxDocs=44421)
0.0390625 = fieldNorm(doc=8)
0.25 = coord(1/4)
- Abstract
- Author Cocitation Analysis (ACA) and Web Colink Analysis (WCA) are examined as sister techniques in the related fields of bibliometrics and webometrics. Comparisons are made between the two techniques based on their data retrieval, mapping, and interpretation procedures, using mathematics as the subject in focus. An ACA is carried out and interpreted for a group of participants (authors) involved in an Isaac Newton Institute (2000) workshop-Singularity Theory and Its Applications to Wave Propagation Theory and Dynamical Systems-and compared/contrasted with a WCA for a list of international mathematics research institute home pages on the Web. Although the practice of ACA may be used to inform a WCA, the two techniques do not share many elements in common. The most important departure between ACA and WCA exists at the interpretive stage when ACA maps become meaningful in light of citation theory, and WCA maps require interpretation based on hyperlink theory. Much of the research concerning link theory and motivations for linking is still new; therefore further studies based on colinking are needed, mainly map-based studies, to understand what makes a Web colink structure meaningful.
-
Chau, M.; Shiu, B.; Chan, M.; Chen, H.: Redips: backlink search and analysis on the Web for business intelligence analysis (2007)
0.05
0.05300856 = product of:
0.21203424 = sum of:
0.21203424 = weight(_text_:hyperlink in 1142) [ClassicSimilarity], result of:
0.21203424 = score(doc=1142,freq=2.0), product of:
0.49147287 = queryWeight, product of:
7.809647 = idf(docFreq=48, maxDocs=44421)
0.06293151 = queryNorm
0.43142614 = fieldWeight in 1142, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.809647 = idf(docFreq=48, maxDocs=44421)
0.0390625 = fieldNorm(doc=1142)
0.25 = coord(1/4)
- Abstract
- The World Wide Web presents significant opportunities for business intelligence analysis as it can provide information about a company's external environment and its stakeholders. Traditional business intelligence analysis on the Web has focused on simple keyword searching. Recently, it has been suggested that the incoming links, or backlinks, of a company's Web site (i.e., other Web pages that have a hyperlink pointing to the company of Interest) can provide important insights about the company's "online communities." Although analysis of these communities can provide useful signals for a company and information about its stakeholder groups, the manual analysis process can be very time-consuming for business analysts and consultants. In this article, we present a tool called Redips that automatically integrates backlink meta-searching and text-mining techniques to facilitate users in performing such business intelligence analysis on the Web. The architectural design and implementation of the tool are presented in the article. To evaluate the effectiveness, efficiency, and user satisfaction of Redips, an experiment was conducted to compare the tool with two popular business Intelligence analysis methods-using backlink search engines and manual browsing. The experiment results showed that Redips was statistically more effective than both benchmark methods (in terms of Recall and F-measure) but required more time in search tasks. In terms of user satisfaction, Redips scored statistically higher than backlink search engines in all five measures used, and also statistically higher than manual browsing in three measures.
-
Liu, Y.; Zhang, M.; Cen, R.; Ru, L.; Ma, S.: Data cleansing for Web information retrieval using query independent features (2007)
0.05
0.05300856 = product of:
0.21203424 = sum of:
0.21203424 = weight(_text_:hyperlink in 1607) [ClassicSimilarity], result of:
0.21203424 = score(doc=1607,freq=2.0), product of:
0.49147287 = queryWeight, product of:
7.809647 = idf(docFreq=48, maxDocs=44421)
0.06293151 = queryNorm
0.43142614 = fieldWeight in 1607, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.809647 = idf(docFreq=48, maxDocs=44421)
0.0390625 = fieldNorm(doc=1607)
0.25 = coord(1/4)
- Abstract
- Understanding what kinds of Web pages are the most useful for Web search engine users is a critical task in Web information retrieval (IR). Most previous works used hyperlink analysis algorithms to solve this problem. However, little research has been focused on query-independent Web data cleansing for Web IR. In this paper, we first provide analysis of the differences between retrieval target pages and ordinary ones based on more than 30 million Web pages obtained from both the Text Retrieval Conference (TREC) and a widely used Chinese search engine, SOGOU (www.sogou.com). We further propose a learning-based data cleansing algorithm for reducing Web pages that are unlikely to be useful for user requests. We found that there exists a large proportion of low-quality Web pages in both the English and the Chinese Web page corpus, and retrieval target pages can be identified using query-independent features and cleansing algorithms. The experimental results showed that our algorithm is effective in reducing a large portion of Web pages with a small loss in retrieval target pages. It makes it possible for Web IR tools to meet a large fraction of users' needs with only a small part of pages on the Web. These results may help Web search engines make better use of their limited storage and computation resources to improve search performance.
-
Saverio Perugini, S.: Symbolic links in the Open Directory Project (2008)
0.05
0.05300856 = product of:
0.21203424 = sum of:
0.21203424 = weight(_text_:hyperlink in 3070) [ClassicSimilarity], result of:
0.21203424 = score(doc=3070,freq=2.0), product of:
0.49147287 = queryWeight, product of:
7.809647 = idf(docFreq=48, maxDocs=44421)
0.06293151 = queryNorm
0.43142614 = fieldWeight in 3070, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.809647 = idf(docFreq=48, maxDocs=44421)
0.0390625 = fieldNorm(doc=3070)
0.25 = coord(1/4)
- Abstract
- We present a study to develop an improved understanding of symbolic links in web directories. A symbolic link is a hyperlink which makes a directed connection from a webpage along one path through a directory to a page along another path. While symbolic links are ubiquitous in web directories such as Yahoo!, they are under-studied and, as a result, their uses are poorly understood. A cursory analysis of symbolic links reveals multiple uses: to provide navigational shortcuts deeper into a directory, backlinks to more general categories, and multiclassification. We investigated these uses in the Open Directory Project (ODP), the largest, most comprehensive, and most widely distributed human-compiled taxonomy of links to websites, which makes extensive use of symbolic links. The results reveal that while symbolic links in ODP are used primarily for multiclassification, only few multiclassification links actually span top- and second-level categories. This indicates that most symbolic links in ODP are used to create multiclassification between topics which are nested more than two levels deep and suggests that there may be multiple uses of multiclassification links. We also situate symbolic links vis à vis other semantic and structural link types from hypermedia. We anticipate that the results and relationships identified and discussed in this paper will provide a foundation for (1) users for understanding the usages of symbolic links in a directory, (2) designers to employ symbolic links more effectively when building and maintaining directories and for crafting user interfaces to them, and (3) information retrieval researchers for further study of symbolic links in web directories.
-
Thelwall, M.: ¬A comparison of link and URL citation counting (2011)
0.05
0.05300856 = product of:
0.21203424 = sum of:
0.21203424 = weight(_text_:hyperlink in 533) [ClassicSimilarity], result of:
0.21203424 = score(doc=533,freq=2.0), product of:
0.49147287 = queryWeight, product of:
7.809647 = idf(docFreq=48, maxDocs=44421)
0.06293151 = queryNorm
0.43142614 = fieldWeight in 533, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.809647 = idf(docFreq=48, maxDocs=44421)
0.0390625 = fieldNorm(doc=533)
0.25 = coord(1/4)
- Abstract
- Purpose - Link analysis is an established topic within webometrics. It normally uses counts of links between sets of web sites or to sets of web sites. These link counts are derived from web crawlers or commercial search engines with the latter being the only alternative for some investigations. This paper compares link counts with URL citation counts in order to assess whether the latter could be a replacement for the former if the major search engines withdraw their advanced hyperlink search facilities. Design/methodology/approach - URL citation counts are compared with link counts for a variety of data sets used in previous webometric studies. Findings - The results show a high degree of correlation between the two but with URL citations being much less numerous, at least outside academia and business. Research limitations/implications - The results cover a small selection of 15 case studies and so the findings are only indicative. Significant differences between results indicate that the difference between link counts and URL citation counts will vary between webometric studies. Practical implications - Should link searches be withdrawn, then link analyses of less well linked non-academic, non-commercial sites would be seriously weakened, although citations based on e-mail addresses could help to make citations more numerous than links for some business and academic contexts. Originality/value - This is the first systematic study of the difference between link counts and URL citation counts in a variety of contexts and it shows that there are significant differences between the two.
-
Vaughan, L.; Yang, R.: Web data as academic and business quality estimates : a comparison of three data sources (2012)
0.05
0.05300856 = product of:
0.21203424 = sum of:
0.21203424 = weight(_text_:hyperlink in 1452) [ClassicSimilarity], result of:
0.21203424 = score(doc=1452,freq=2.0), product of:
0.49147287 = queryWeight, product of:
7.809647 = idf(docFreq=48, maxDocs=44421)
0.06293151 = queryNorm
0.43142614 = fieldWeight in 1452, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.809647 = idf(docFreq=48, maxDocs=44421)
0.0390625 = fieldNorm(doc=1452)
0.25 = coord(1/4)
- Abstract
- Earlier studies found that web hyperlink data contain various types of information, ranging from academic to political, that can be used to analyze a variety of social phenomena. Specifically, the numbers of inlinks to academic websites are associated with academic performance, while the counts of inlinks to company websites correlate with business variables. However, the scarcity of sources from which to collect inlink data in recent years has required us to seek new data sources. The recent demise of the inlink search function of Yahoo! made this need more pressing. Different alternative variables or data sources have been proposed. This study compared three types of web data to determine which are better as academic and business quality estimates, and what are the relationships among the three data sources. The study found that Alexa inlink and Google URL citation data can replace Yahoo! inlink data and that the former is better than the latter. Alexa is even better than Yahoo!, which has been the main data source in recent years. The unique nature of Alexa data could explain its relative advantages over other data sources.
-
Yang, P.; Gao, W.; Tan, Q.; Wong, K.-F.: ¬A link-bridged topic model for cross-domain document classification (2013)
0.05
0.05300856 = product of:
0.21203424 = sum of:
0.21203424 = weight(_text_:hyperlink in 3706) [ClassicSimilarity], result of:
0.21203424 = score(doc=3706,freq=2.0), product of:
0.49147287 = queryWeight, product of:
7.809647 = idf(docFreq=48, maxDocs=44421)
0.06293151 = queryNorm
0.43142614 = fieldWeight in 3706, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.809647 = idf(docFreq=48, maxDocs=44421)
0.0390625 = fieldNorm(doc=3706)
0.25 = coord(1/4)
- Abstract
- Transfer learning utilizes labeled data available from some related domain (source domain) for achieving effective knowledge transformation to the target domain. However, most state-of-the-art cross-domain classification methods treat documents as plain text and ignore the hyperlink (or citation) relationship existing among the documents. In this paper, we propose a novel cross-domain document classification approach called Link-Bridged Topic model (LBT). LBT consists of two key steps. Firstly, LBT utilizes an auxiliary link network to discover the direct or indirect co-citation relationship among documents by embedding the background knowledge into a graph kernel. The mined co-citation relationship is leveraged to bridge the gap across different domains. Secondly, LBT simultaneously combines the content information and link structures into a unified latent topic model. The model is based on an assumption that the documents of source and target domains share some common topics from the point of view of both content information and link structure. By mapping both domains data into the latent topic spaces, LBT encodes the knowledge about domain commonality and difference as the shared topics with associated differential probabilities. The learned latent topics must be consistent with the source and target data, as well as content and link statistics. Then the shared topics act as the bridge to facilitate knowledge transfer from the source to the target domains. Experiments on different types of datasets show that our algorithm significantly improves the generalization performance of cross-domain document classification.
-
Gibson, P.: Professionals' perfect Web world in sight : users want more information on the Web, and vendors attempt to provide (1998)
0.05
0.051800683 = product of:
0.20720273 = sum of:
0.20720273 = weight(_text_:java in 2656) [ClassicSimilarity], result of:
0.20720273 = score(doc=2656,freq=2.0), product of:
0.44351026 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06293151 = queryNorm
0.46718815 = fieldWeight in 2656, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.046875 = fieldNorm(doc=2656)
0.25 = coord(1/4)
- Abstract
- Many information professionals feel that the time is still far off when the WWW can offer the combined funtionality and content of traditional online and CD-ROM databases, but there have been a number of recent Web developments to reflect on. Describes the testing and launch by Ovid of its Java client which, in effect, allows access to its databases on the Web with full search functionality, and the initiative of Euromonitor in providing Web access to its whole collection of consumer research reports and its entire database of business sources. Also reviews the service of a newcomer to the information scene, Information Quest (IQ) founded by Dawson Holdings which has made an agreement with Infonautics to offer access to its Electric Library database thus adding over 1.000 reference, consumer and business publications to its Web based journal service
-
Nieuwenhuysen, P.; Vanouplines, P.: Document plus program hybrids on the Internet and their impact on information transfer (1998)
0.05
0.051800683 = product of:
0.20720273 = sum of:
0.20720273 = weight(_text_:java in 2893) [ClassicSimilarity], result of:
0.20720273 = score(doc=2893,freq=2.0), product of:
0.44351026 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06293151 = queryNorm
0.46718815 = fieldWeight in 2893, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.046875 = fieldNorm(doc=2893)
0.25 = coord(1/4)
- Abstract
- Examines some of the advanced tools, techniques, methods and standards related to the Internet and WWW which consist of hybrids of documents and software, called 'document program hybrids'. Early Internet systems were based on having documents on one side and software on the other, neatly separated, apart from one another and without much interaction, so that the static document can also exist without computers and networks. Documentation program hybrids blur this classical distinction and all components are integrated, interwoven and exist in synergy with each other. Illustrates the techniques with particular reference to practical examples, including: dara collections and dedicated software; advanced HTML features on the WWW, multimedia viewer and plug in software for Internet and WWW browsers; VRML; interaction through a Web server with other servers and with instruments; adaptive hypertext provided by the server; 'webbots' or 'knowbots' or 'searchbots' or 'metasearch engines' or intelligent software agents; Sun's Java; Microsoft's ActiveX; program scripts for HTML and Web browsers; cookies; and Internet push technology with Webcasting channels
-
Mills, T.; Moody, K.; Rodden, K.: Providing world wide access to historical sources (1997)
0.05
0.051800683 = product of:
0.20720273 = sum of:
0.20720273 = weight(_text_:java in 3697) [ClassicSimilarity], result of:
0.20720273 = score(doc=3697,freq=2.0), product of:
0.44351026 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06293151 = queryNorm
0.46718815 = fieldWeight in 3697, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.046875 = fieldNorm(doc=3697)
0.25 = coord(1/4)
- Abstract
- A unique collection of historical material covering the lives and events of an English village between 1400 and 1750 has been made available via a WWW enabled information retrieval system. Since the expected readership of the documents ranges from school children to experienced researchers, providing this information in an easily accessible form has offered many challenges requiring tools to aid searching and browsing. The file structure of the document collection was replaced by an database, enabling query results to be presented on the fly. A Java interface displays each user's context in a form that allows for easy and intuitive relevance feedback
-
Maarek, Y.S.: WebCutter : a system for dynamic and tailorable site mapping (1997)
0.05
0.051800683 = product of:
0.20720273 = sum of:
0.20720273 = weight(_text_:java in 3739) [ClassicSimilarity], result of:
0.20720273 = score(doc=3739,freq=2.0), product of:
0.44351026 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06293151 = queryNorm
0.46718815 = fieldWeight in 3739, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.046875 = fieldNorm(doc=3739)
0.25 = coord(1/4)
- Abstract
- Presents an approach that integrates searching and browsing in a manner that improves both paradigms. When browsing is the primary task, it enables semantic content-based tailoring of Web maps in both the generation as well as the visualization phases. When search is the primary task, it enables contextualization of the results by augmenting them with the documents' neighbourhoods. This approach is embodied in WebCutter, a client-server system fully integrated with Web software. WebCutter consists of a map generator running off a standard Web server and a map visualization client implemented as a Java applet runalble from any standard Web browser and requiring no installation or external plug-in application. WebCutter is in beta stage and is in the process of being integrated into the Lotus Domino application product line
-
Pan, B.; Gay, G.; Saylor, J.; Hembrooke, H.: One digital library, two undergraduate casses, and four learning modules : uses of a digital library in cassrooms (2006)
0.05
0.051800683 = product of:
0.20720273 = sum of:
0.20720273 = weight(_text_:java in 907) [ClassicSimilarity], result of:
0.20720273 = score(doc=907,freq=2.0), product of:
0.44351026 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06293151 = queryNorm
0.46718815 = fieldWeight in 907, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.046875 = fieldNorm(doc=907)
0.25 = coord(1/4)
- Abstract
- The KMODDL (kinematic models for design digital library) is a digital library based on a historical collection of kinematic models made of steel and bronze. The digital library contains four types of learning modules including textual materials, QuickTime virtual reality movies, Java simulations, and stereolithographic files of the physical models. The authors report an evaluation study on the uses of the KMODDL in two undergraduate classes. This research reveals that the users in different classes encountered different usability problems, and reported quantitatively different subjective experiences. Further, the results indicate that depending on the subject area, the two user groups preferred different types of learning modules, resulting in different uses of the available materials and different learning outcomes. These findings are discussed in terms of their implications for future digital library design.
-
Mongin, L.; Fu, Y.Y.; Mostafa, J.: Open Archives data Service prototype and automated subject indexing using D-Lib archive content as a testbed (2003)
0.05
0.051800683 = product of:
0.20720273 = sum of:
0.20720273 = weight(_text_:java in 2167) [ClassicSimilarity], result of:
0.20720273 = score(doc=2167,freq=2.0), product of:
0.44351026 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06293151 = queryNorm
0.46718815 = fieldWeight in 2167, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.046875 = fieldNorm(doc=2167)
0.25 = coord(1/4)
- Abstract
- The Indiana University School of Library and Information Science opened a new research laboratory in January 2003; The Indiana University School of Library and Information Science Information Processing Laboratory [IU IP Lab]. The purpose of the new laboratory is to facilitate collaboration between scientists in the department in the areas of information retrieval (IR) and information visualization (IV) research. The lab has several areas of focus. These include grid and cluster computing, and a standard Java-based software platform to support plug and play research datasets, a selection of standard IR modules and standard IV algorithms. Future development includes software to enable researchers to contribute datasets, IR algorithms, and visualization algorithms into the standard environment. We decided early on to use OAI-PMH as a resource discovery tool because it is consistent with our mission.
-
Song, R.; Luo, Z.; Nie, J.-Y.; Yu, Y.; Hon, H.-W.: Identification of ambiguous queries in web search (2009)
0.05
0.051800683 = product of:
0.20720273 = sum of:
0.20720273 = weight(_text_:java in 3441) [ClassicSimilarity], result of:
0.20720273 = score(doc=3441,freq=2.0), product of:
0.44351026 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06293151 = queryNorm
0.46718815 = fieldWeight in 3441, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.046875 = fieldNorm(doc=3441)
0.25 = coord(1/4)
- Abstract
- It is widely believed that many queries submitted to search engines are inherently ambiguous (e.g., java and apple). However, few studies have tried to classify queries based on ambiguity and to answer "what the proportion of ambiguous queries is". This paper deals with these issues. First, we clarify the definition of ambiguous queries by constructing the taxonomy of queries from being ambiguous to specific. Second, we ask human annotators to manually classify queries. From manually labeled results, we observe that query ambiguity is to some extent predictable. Third, we propose a supervised learning approach to automatically identify ambiguous queries. Experimental results show that we can correctly identify 87% of labeled queries with the approach. Finally, by using our approach, we estimate that about 16% of queries in a real search log are ambiguous.
-
Croft, W.B.; Metzler, D.; Strohman, T.: Search engines : information retrieval in practice (2010)
0.05
0.051800683 = product of:
0.20720273 = sum of:
0.20720273 = weight(_text_:java in 3605) [ClassicSimilarity], result of:
0.20720273 = score(doc=3605,freq=2.0), product of:
0.44351026 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06293151 = queryNorm
0.46718815 = fieldWeight in 3605, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.046875 = fieldNorm(doc=3605)
0.25 = coord(1/4)
- Abstract
- For introductory information retrieval courses at the undergraduate and graduate level in computer science, information science and computer engineering departments. Written by a leader in the field of information retrieval, Search Engines: Information Retrieval in Practice, is designed to give undergraduate students the understanding and tools they need to evaluate, compare and modify search engines. Coverage of the underlying IR and mathematical models reinforce key concepts. The book's numerous programming exercises make extensive use of Galago, a Java-based open source search engine. SUPPLEMENTS / Extensive lecture slides (in PDF and PPT format) / Solutions to selected end of chapter problems (Instructors only) / Test collections for exercises / Galago search engine
-
Tang, X.-B.; Wei Wei, G,-C.L.; Zhu, J.: ¬An inference model of medical insurance fraud detection : based on ontology and SWRL (2017)
0.05
0.051800683 = product of:
0.20720273 = sum of:
0.20720273 = weight(_text_:java in 4615) [ClassicSimilarity], result of:
0.20720273 = score(doc=4615,freq=2.0), product of:
0.44351026 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06293151 = queryNorm
0.46718815 = fieldWeight in 4615, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.046875 = fieldNorm(doc=4615)
0.25 = coord(1/4)
- Abstract
- Medical insurance fraud is common in many countries' medical insurance systems and represents a serious threat to the insurance funds and the benefits of patients. In this paper, we present an inference model of medical insurance fraud detection, based on a medical detection domain ontology that incorporates the knowledge base provided by the Medical Terminology, NKIMed, and Chinese Library Classification systems. Through analyzing the behaviors of irregular and fraudulent medical services, we defined the scope of the medical domain ontology relevant to the task and built the ontology about medical sciences and medical service behaviors. The ontology then utilizes Semantic Web Rule Language (SWRL) and Java Expert System Shell (JESS) to detect medical irregularities and mine implicit knowledge. The system can be used to improve the management of medical insurance risks.
-
Chen, H.; Chung, Y.-M.; Ramsey, M.; Yang, C.C.: ¬A smart itsy bitsy spider for the Web (1998)
0.04
0.04316724 = product of:
0.17266896 = sum of:
0.17266896 = weight(_text_:java in 1871) [ClassicSimilarity], result of:
0.17266896 = score(doc=1871,freq=2.0), product of:
0.44351026 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06293151 = queryNorm
0.38932347 = fieldWeight in 1871, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.0390625 = fieldNorm(doc=1871)
0.25 = coord(1/4)
- Abstract
- As part of the ongoing Illinois Digital Library Initiative project, this research proposes an intelligent agent approach to Web searching. In this experiment, we developed 2 Web personal spiders based on best first search and genetic algorithm techniques, respectively. These personal spiders can dynamically take a user's selected starting homepages and search for the most closely related homepages in the Web, based on the links and keyword indexing. A graphical, dynamic, Jav-based interface was developed and is available for Web access. A system architecture for implementing such an agent-spider is presented, followed by deteiled discussions of benchmark testing and user evaluation results. In benchmark testing, although the genetic algorithm spider did not outperform the best first search spider, we found both results to be comparable and complementary. In user evaluation, the genetic algorithm spider obtained significantly higher recall value than that of the best first search spider. However, their precision values were not statistically different. The mutation process introduced in genetic algorithms allows users to find other potential relevant homepages that cannot be explored via a conventional local search process. In addition, we found the Java-based interface to be a necessary component for design of a truly interactive and dynamic Web agent