-
Chen, H.; Chung, Y.-M.; Ramsey, M.; Yang, C.C.: ¬A smart itsy bitsy spider for the Web (1998)
0.05
0.04514617 = product of:
0.18058468 = sum of:
0.18058468 = weight(_text_:java in 1871) [ClassicSimilarity], result of:
0.18058468 = score(doc=1871,freq=2.0), product of:
0.46384227 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.0658165 = queryNorm
0.38932347 = fieldWeight in 1871, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.0390625 = fieldNorm(doc=1871)
0.25 = coord(1/4)
- Abstract
- As part of the ongoing Illinois Digital Library Initiative project, this research proposes an intelligent agent approach to Web searching. In this experiment, we developed 2 Web personal spiders based on best first search and genetic algorithm techniques, respectively. These personal spiders can dynamically take a user's selected starting homepages and search for the most closely related homepages in the Web, based on the links and keyword indexing. A graphical, dynamic, Jav-based interface was developed and is available for Web access. A system architecture for implementing such an agent-spider is presented, followed by deteiled discussions of benchmark testing and user evaluation results. In benchmark testing, although the genetic algorithm spider did not outperform the best first search spider, we found both results to be comparable and complementary. In user evaluation, the genetic algorithm spider obtained significantly higher recall value than that of the best first search spider. However, their precision values were not statistically different. The mutation process introduced in genetic algorithms allows users to find other potential relevant homepages that cannot be explored via a conventional local search process. In addition, we found the Java-based interface to be a necessary component for design of a truly interactive and dynamic Web agent
-
Chen, C.: CiteSpace II : detecting and visualizing emerging trends and transient patterns in scientific literature (2006)
0.05
0.04514617 = product of:
0.18058468 = sum of:
0.18058468 = weight(_text_:java in 272) [ClassicSimilarity], result of:
0.18058468 = score(doc=272,freq=2.0), product of:
0.46384227 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.0658165 = queryNorm
0.38932347 = fieldWeight in 272, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.0390625 = fieldNorm(doc=272)
0.25 = coord(1/4)
- Abstract
- This article describes the latest development of a generic approach to detecting and visualizing emerging trends and transient patterns in scientific literature. The work makes substantial theoretical and methodological contributions to progressive knowledge domain visualization. A specialty is conceptualized and visualized as a time-variant duality between two fundamental concepts in information science: research fronts and intellectual bases. A research front is defined as an emergent and transient grouping of concepts and underlying research issues. The intellectual base of a research front is its citation and co-citation footprint in scientific literature - an evolving network of scientific publications cited by research-front concepts. Kleinberg's (2002) burst-detection algorithm is adapted to identify emergent research-front concepts. Freeman's (1979) betweenness centrality metric is used to highlight potential pivotal points of paradigm shift over time. Two complementary visualization views are designed and implemented: cluster views and time-zone views. The contributions of the approach are that (a) the nature of an intellectual base is algorithmically and temporally identified by emergent research-front terms, (b) the value of a co-citation cluster is explicitly interpreted in terms of research-front concepts, and (c) visually prominent and algorithmically detected pivotal points substantially reduce the complexity of a visualized network. The modeling and visualization process is implemented in CiteSpace II, a Java application, and applied to the analysis of two research fields: mass extinction (1981-2004) and terrorism (1990-2003). Prominent trends and pivotal points in visualized networks were verified in collaboration with domain experts, who are the authors of pivotal-point articles. Practical implications of the work are discussed. A number of challenges and opportunities for future studies are identified.
-
Eddings, J.: How the Internet works (1994)
0.05
0.04514617 = product of:
0.18058468 = sum of:
0.18058468 = weight(_text_:java in 2514) [ClassicSimilarity], result of:
0.18058468 = score(doc=2514,freq=2.0), product of:
0.46384227 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.0658165 = queryNorm
0.38932347 = fieldWeight in 2514, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.0390625 = fieldNorm(doc=2514)
0.25 = coord(1/4)
- Abstract
- How the Internet Works promises "an exciting visual journey down the highways and byways of the Internet," and it delivers. The book's high quality graphics and simple, succinct text make it the ideal book for beginners; however it still has much to offer for Net vets. This book is jam- packed with cool ways to visualize how the Net works. The first section visually explores how TCP/IP, Winsock, and other Net connectivity mysteries work. This section also helps you understand how e-mail addresses and domains work, what file types mean, and how information travels across the Net. Part 2 unravels the Net's underlying architecture, including good information on how routers work and what is meant by client/server architecture. The third section covers your own connection to the Net through an Internet Service Provider (ISP), and how ISDN, cable modems, and Web TV work. Part 4 discusses e-mail, spam, newsgroups, Internet Relay Chat (IRC), and Net phone calls. In part 5, you'll find out how other Net tools, such as gopher, telnet, WAIS, and FTP, can enhance your Net experience. The sixth section takes on the World Wide Web, including everything from how HTML works to image maps and forms. Part 7 looks at other Web features such as push technology, Java, ActiveX, and CGI scripting, while part 8 deals with multimedia on the Net. Part 9 shows you what intranets are and covers groupware, and shopping and searching the Net. The book wraps up with part 10, a chapter on Net security that covers firewalls, viruses, cookies, and other Web tracking devices, plus cryptography and parental controls.
-
Wu, D.; Shi, J.: Classical music recording ontology used in a library catalog (2016)
0.05
0.04514617 = product of:
0.18058468 = sum of:
0.18058468 = weight(_text_:java in 4179) [ClassicSimilarity], result of:
0.18058468 = score(doc=4179,freq=2.0), product of:
0.46384227 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.0658165 = queryNorm
0.38932347 = fieldWeight in 4179, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.0390625 = fieldNorm(doc=4179)
0.25 = coord(1/4)
- Abstract
- In order to improve the organization of classical music information resources, we constructed a classical music recording ontology, on top of which we then designed an online classical music catalog. Our construction of the classical music recording ontology consisted of three steps: identifying the purpose, analyzing the ontology, and encoding the ontology. We identified the main classes and properties of the domain by investigating classical music recording resources and users' information needs. We implemented the ontology in the Web Ontology Language (OWL) using five steps: transforming the properties, encoding the transformed properties, defining ranges of the properties, constructing individuals, and standardizing the ontology. In constructing the online catalog, we first designed the structure and functions of the catalog based on investigations into users' information needs and information-seeking behaviors. Then we extracted classes and properties of the ontology using the Apache Jena application programming interface (API), and constructed a catalog in the Java environment. The catalog provides a hierarchical main page (built using the Functional Requirements for Bibliographic Records (FRBR) model), a classical music information network and integrated information service; this combination of features greatly eases the task of finding classical music recordings and more information about classical music.
-
Barker, P.: ¬An examination of the use of the OSI Directory for accessing bibliographic information : project ABDUX (1993)
0.04
0.04291015 = product of:
0.1716406 = sum of:
0.1716406 = weight(_text_:handling in 7309) [ClassicSimilarity], result of:
0.1716406 = score(doc=7309,freq=2.0), product of:
0.4128091 = queryWeight, product of:
6.272122 = idf(docFreq=227, maxDocs=44421)
0.0658165 = queryNorm
0.41578686 = fieldWeight in 7309, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.272122 = idf(docFreq=227, maxDocs=44421)
0.046875 = fieldNorm(doc=7309)
0.25 = coord(1/4)
- Abstract
- Describes the work of the ABDUX project, containing a brief description of the rationale for using X.500 for access to bibliographic information. Outlines the project's design work and a demonstration system. Reviews the standards applicable to bibliographic data and library OPACs. Highlights difficulties found when handling bibliographic data in library systems. Discusses the service requirements of OPACs for accessing bibliographic, discussing how X.500 Directory services may be used. Suggests the DIT structures that coulb be used for storing both bibliographic information and descriptions on information resources in general in the directory. Describes the way in which the model of bibliographic data is presented. Outlines the syntax of ASN.1 and how records and fields may be described in terms of X.500 object classes and attribute types. Details the mapping of MARC format into an X.500 compatible form. Provides the schema information for representing research notes and archives, not covered by MARC definitions. Examines the success in implementing the designs and loos ahead to future possibilities
-
Fisher, S.; Rowley, J.: Management information and library management systems : an overview (1994)
0.04
0.04291015 = product of:
0.1716406 = sum of:
0.1716406 = weight(_text_:handling in 7442) [ClassicSimilarity], result of:
0.1716406 = score(doc=7442,freq=2.0), product of:
0.4128091 = queryWeight, product of:
6.272122 = idf(docFreq=227, maxDocs=44421)
0.0658165 = queryNorm
0.41578686 = fieldWeight in 7442, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.272122 = idf(docFreq=227, maxDocs=44421)
0.046875 = fieldNorm(doc=7442)
0.25 = coord(1/4)
- Abstract
- Management information facilities transform the library management system into a much more effective management tool. Three levels of management can be identified - operational, tactical and strategic - and each of these has its own unique management information needs. Earlier work on the use of management information in libraries and the development of management information systems demonstrates that progress in these areas has been slow. Management information systems comprise three components: facilities for handling ad hoc enquiries; facilities for standard report report generation; and management information modules, or report generators that support the production of user-defined reports. A lsit of standard reports covering acquisitions, cataloguing, circulation control, serials and inter-library loans is provided. The functions of report generators are explored and the nature of enquiry facilities reviewed. Management information tools available in library management systems form a valuable aid in decision making. These should be further exploited and further developed
-
Beynon-Davies, P.: ¬A semantic database approach to knowledge-based hypermedia systems (1994)
0.04
0.04291015 = product of:
0.1716406 = sum of:
0.1716406 = weight(_text_:handling in 830) [ClassicSimilarity], result of:
0.1716406 = score(doc=830,freq=2.0), product of:
0.4128091 = queryWeight, product of:
6.272122 = idf(docFreq=227, maxDocs=44421)
0.0658165 = queryNorm
0.41578686 = fieldWeight in 830, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.272122 = idf(docFreq=227, maxDocs=44421)
0.046875 = fieldNorm(doc=830)
0.25 = coord(1/4)
- Abstract
- Discusses an architecture for knowledge-based hypermedia systems based on work from semantic databases. Its power derives from its use of a single, uniform data structure which can be used to store both the intensional and extensional information needed to generate hypermedia systems. The architecture is also sufficiently powerful to accomodate the representation of reasonable amount of knowledge within a hypermedia system. Work has been conducted in building a number of prototypes on a small information base of digital image data. The prototypes serve as demonstrators of systems for managing the large amount of information held by museums of their artifacts. The aim of this work is to demonstrate the flexibility of the architecture in sereving the needs of a number of distinct user groups. The first prototype has demonstrated that the virtual architecture is capable of supporting some of the main hypermedia access methods. The current demonstrator is being used to investigate the potential of the approach for handling multiple classifications of hypermedia material. The research is particularly directed at the incorporation of evolving temporal and spatial knowledge
-
Ohsuga, S.: ¬A way of designing knowledge based systems (1995)
0.04
0.04291015 = product of:
0.1716406 = sum of:
0.1716406 = weight(_text_:handling in 3278) [ClassicSimilarity], result of:
0.1716406 = score(doc=3278,freq=2.0), product of:
0.4128091 = queryWeight, product of:
6.272122 = idf(docFreq=227, maxDocs=44421)
0.0658165 = queryNorm
0.41578686 = fieldWeight in 3278, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.272122 = idf(docFreq=227, maxDocs=44421)
0.046875 = fieldNorm(doc=3278)
0.25 = coord(1/4)
- Abstract
- Discusses the design of intelligent knowledge based systems. Discusses the kinds of systems that are capable of handling diverse problems arising in the real world, and solving them autonomously. A new approach is necessary for designing such systems. Analyzes huhman activities and describes a way of representing each activity as a compound of basic intelligent functions. Some functions are represented as the compounds of other functions. Thus, a hierarchy of the functions is constructed to form the software architecture of an intelligent system, where the human interface appears on top of this structure. Intelligent systems need to be provided with considerable knowledge. However, it is very wasteful to let every person collect and structure large amounts of knowledge. It is desirable that there should be large knowledge bases which can supply each intelligent knowledge system as necessary. Discusses a network system consisting of many intelligent systems and one or more large commonly accessible knowledge bases
-
Joss, M.W.; Wszola, S.: ¬The engines that can : text search and retrieval software, their strategies, and vendors (1996)
0.04
0.04291015 = product of:
0.1716406 = sum of:
0.1716406 = weight(_text_:handling in 5191) [ClassicSimilarity], result of:
0.1716406 = score(doc=5191,freq=2.0), product of:
0.4128091 = queryWeight, product of:
6.272122 = idf(docFreq=227, maxDocs=44421)
0.0658165 = queryNorm
0.41578686 = fieldWeight in 5191, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.272122 = idf(docFreq=227, maxDocs=44421)
0.046875 = fieldNorm(doc=5191)
0.25 = coord(1/4)
- Abstract
- Traces the development of text searching and retrieval software designed to cope with the increasing demands made by the storage and handling of large amounts of data, recorded on high data storage media, from CD-ROM to multi gigabyte storage media and online information services, with particular reference to the need to cope with graphics as well as conventional ASCII text. Includes details of: Boolean searching, fuzzy searching and matching; relevance ranking; proximity searching and improved strategies for dealing with text searching in very large databases. Concludes that the best searching tools for CD-ROM publishers are those optimized for searching and retrieval on CD-ROM. CD-ROM drives have relatively lower random seek times than hard discs and so the software most appropriate to the medium is that which can effectively arrange the indexes and text on the CD-ROM to avoid continuous random access searching. Lists and reviews a selection of software packages designed to achieve the sort of results required for rapid CD-ROM searching
-
Fattahi, R.: ¬A uniform approach to the indexing of cataloguing data in online library systems (1997)
0.04
0.04291015 = product of:
0.1716406 = sum of:
0.1716406 = weight(_text_:handling in 1131) [ClassicSimilarity], result of:
0.1716406 = score(doc=1131,freq=2.0), product of:
0.4128091 = queryWeight, product of:
6.272122 = idf(docFreq=227, maxDocs=44421)
0.0658165 = queryNorm
0.41578686 = fieldWeight in 1131, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.272122 = idf(docFreq=227, maxDocs=44421)
0.046875 = fieldNorm(doc=1131)
0.25 = coord(1/4)
- Abstract
- Argues that in library cataloguing and for optional functionality of bibliographic records the indexing of fields and subfields should follow a uniform approach. This would maintain effectiveness in searching, retrieval and display of bibliographic information both within systems and between systems. However, a review of different postings to the AUTOCAT and USMARC discussion lists indicates that the indexing and tagging of cataloguing data do not, at present, follow a consistent approach in online library systems. If the rationale of cataloguing principles is to bring uniformity in bibliographic description and effectiveness in access, they should also address the question of uniform approaches to the indexing of cataloguing data. In this context and in terms of the identification and handling of data elements, cataloguing standards (codes, MARC formats and the Z39.50 standard) should be brought closer, in that they should provide guidelines for the designation of data elements for machine readable records
-
Oppenheim, C.: Managers' use and handling of information (1997)
0.04
0.04291015 = product of:
0.1716406 = sum of:
0.1716406 = weight(_text_:handling in 1357) [ClassicSimilarity], result of:
0.1716406 = score(doc=1357,freq=2.0), product of:
0.4128091 = queryWeight, product of:
6.272122 = idf(docFreq=227, maxDocs=44421)
0.0658165 = queryNorm
0.41578686 = fieldWeight in 1357, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.272122 = idf(docFreq=227, maxDocs=44421)
0.046875 = fieldNorm(doc=1357)
0.25 = coord(1/4)
-
Taylor, M.J.; Mortimer, A.M.; Addison, M.A.; Turner, M.C.R.: 'NESS-plants' : an interactive multi-media information system for botanic gardens (1994)
0.04
0.04291015 = product of:
0.1716406 = sum of:
0.1716406 = weight(_text_:handling in 3733) [ClassicSimilarity], result of:
0.1716406 = score(doc=3733,freq=2.0), product of:
0.4128091 = queryWeight, product of:
6.272122 = idf(docFreq=227, maxDocs=44421)
0.0658165 = queryNorm
0.41578686 = fieldWeight in 3733, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.272122 = idf(docFreq=227, maxDocs=44421)
0.046875 = fieldNorm(doc=3733)
0.25 = coord(1/4)
- Abstract
- Multimedia techniques facilitate simplified access to information databases containing diverse types of data, ranging from simple text to audio and pictorial information. The design of such systems focuses on the user interface, with particular emphasis on navigation through the information base. The paper describes the enhancement of an interactive multimedia information system developed at the University of Liverpool for use by the general public in the University's Botanic Gardens, at Ness. The original system consists of a plant record management system for handling textual and graphical information, and a library of pictures; a task oriented user interface providing flexible interrogation of the held information; and independent access facilities for casual visitors i.e. the general public and professional curators i.e. the garden staff. A novel feature of the general public's interaction is the ability to compose complex queries visually, using multimediua techniques. These locate individual plant records in the context of an on screen map which represents the geographic layout of the garden
-
Cousins, S.A.: Duplicate detection and record consolidation in large bibliographic databases : the COPAC database experience (1998)
0.04
0.04291015 = product of:
0.1716406 = sum of:
0.1716406 = weight(_text_:handling in 3833) [ClassicSimilarity], result of:
0.1716406 = score(doc=3833,freq=2.0), product of:
0.4128091 = queryWeight, product of:
6.272122 = idf(docFreq=227, maxDocs=44421)
0.0658165 = queryNorm
0.41578686 = fieldWeight in 3833, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.272122 = idf(docFreq=227, maxDocs=44421)
0.046875 = fieldNorm(doc=3833)
0.25 = coord(1/4)
- Abstract
- COPAC (CURL OPAC) is a union catalogue, based on records supplied by members of the Consortium of University Libraries (CURL), giving access to the online catalogue records of some of the largest academic research libraries in the UK and Ireland. Like all union catalogues, COPAC is supplied with multiple copies of records representing the same document in the contributing library catalogues. To reduce the level of duplication visible to the COPAC user, duplicate detection and record consolidation procedures have been developed. These result in the production of a single record for each document, representing the holdings of several libraries. Discusses the ways in which both the duplicate detection and record consolidation procedures are carried out, and problem areas encountered. Describes the general structure of these procedures, providing a model of the duplicate record handling mechanisms used in COPAC
-
Wu, X.: Rule induction with extension matrices (1998)
0.04
0.04291015 = product of:
0.1716406 = sum of:
0.1716406 = weight(_text_:handling in 3912) [ClassicSimilarity], result of:
0.1716406 = score(doc=3912,freq=2.0), product of:
0.4128091 = queryWeight, product of:
6.272122 = idf(docFreq=227, maxDocs=44421)
0.0658165 = queryNorm
0.41578686 = fieldWeight in 3912, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.272122 = idf(docFreq=227, maxDocs=44421)
0.046875 = fieldNorm(doc=3912)
0.25 = coord(1/4)
- Abstract
- Presents a heuristic, attribute-based, noise-tolerant data mining program, HCV (Version 2.0), absed on the newly-developed extension matrix approach. Gives a simple example of attribute-based induction to show the difference between the rules in variable-valued logic produced by HCV, the decision tree generated by C4.5 and the decision tree's decompiled rules by C4.5 rules. Outlines the extension matrix approach for data mining. Describes the HCV algorithm in detail. Outlines techniques developed and implemented in the HCV program for noise handling and discretization of continuous domains respectively. Follows these with a performance comparison of HCV with famous ID3-like algorithms including C4.5 and C4.5 rules on a collection of standard databases including the famous MONK's problems
-
Schupbach, W.: ¬The Iconographic Collections Videodisc at the Wellcome Institute for the History of Medicine, London (1994)
0.04
0.04291015 = product of:
0.1716406 = sum of:
0.1716406 = weight(_text_:handling in 6363) [ClassicSimilarity], result of:
0.1716406 = score(doc=6363,freq=2.0), product of:
0.4128091 = queryWeight, product of:
6.272122 = idf(docFreq=227, maxDocs=44421)
0.0658165 = queryNorm
0.41578686 = fieldWeight in 6363, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.272122 = idf(docFreq=227, maxDocs=44421)
0.046875 = fieldNorm(doc=6363)
0.25 = coord(1/4)
- Abstract
- Many libraries, museums, art galleries, and heritage centres are thinking of using electronic media to help them to achieve their aims and some have started to put these plans into action. One such project is the Iconographic Collections Videodisc, produced by the Wellcome Institute for the History of Medicine Library, London, UK, and available free of charge to the public at the Wellcome Institute building since Jun 93. The Iconographic Collections consists of large collections of prints, drawings, paintings and photographs, covering a range of subjects, including the history of medicine as the central theme. The aims of the project are: to preserve the collections from the damage caused by avoidable exposure to light and handling; and to make available to users allt he items in the library. The videodisc performs the function of a massive; illustrated catalogue. Includes examples of the images stored on the videodisc and the catalogue records held on the system
-
Ioannides, D.: XML schema languages : beyond DTD (2000)
0.04
0.04291015 = product of:
0.1716406 = sum of:
0.1716406 = weight(_text_:handling in 1720) [ClassicSimilarity], result of:
0.1716406 = score(doc=1720,freq=2.0), product of:
0.4128091 = queryWeight, product of:
6.272122 = idf(docFreq=227, maxDocs=44421)
0.0658165 = queryNorm
0.41578686 = fieldWeight in 1720, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.272122 = idf(docFreq=227, maxDocs=44421)
0.046875 = fieldNorm(doc=1720)
0.25 = coord(1/4)
- Abstract
- The flexibility and extensibility of XML have largely contributed to its wide acceptance beyond the traditional realm of SGML. Yet, there is still one more obstacle to be overcome before XML is able to become the evangelized universal data/document format. The obstacle is posed by the limitations of the legacy standard for constraining the contents of an XML document. The traditionally used DTD (document type definition) format does not lend itself to be used in the wide variety of applications XML is capable of handling. The World Wide Web Consortium (W3C) has charged the XML schema working group with the task of developing a schema language to replace DTD. This XML schema language is evolving based on early drafts of XML schema languages. Each one of these early efforts adopted a slightly different approach, but all of them were moving in the same direction.
-
L'Homme, D.; L'Homme, M.-C.; Lemay, C.: Benchmarking the performance of two Part-of-Speech (POS) taggers for terminological purposes (2002)
0.04
0.04291015 = product of:
0.1716406 = sum of:
0.1716406 = weight(_text_:handling in 2855) [ClassicSimilarity], result of:
0.1716406 = score(doc=2855,freq=2.0), product of:
0.4128091 = queryWeight, product of:
6.272122 = idf(docFreq=227, maxDocs=44421)
0.0658165 = queryNorm
0.41578686 = fieldWeight in 2855, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.272122 = idf(docFreq=227, maxDocs=44421)
0.046875 = fieldNorm(doc=2855)
0.25 = coord(1/4)
- Abstract
- Part-of-Speech (POS) taggers are used in an increasing number of terminology applications. However, terminologists do not know exactly how they perform an specialized texts since most POS taggers have been trained an "general" Corpora, that is, Corpora containing all sorts of undifferentiated texts. In this article, we evaluate the Performance of two POS taggers an French and English medical texts. The taggers are TnT (a statistical tagger developed at Saarland University (Brants 2000)) and WinBrill (the Windows version of the tagger initially developed by Eric Brill (1992)). Ten extracts from medical texts were submitted to the taggers and the outputs scanned manually. Results pertain to the accuracy of tagging in terms of correctly and incorrectly tagged words. We also study the handling of unknown words from different viewpoints.
-
Setting the record straight : understanding the MARC format (1993)
0.04
0.04291015 = product of:
0.1716406 = sum of:
0.1716406 = weight(_text_:handling in 3327) [ClassicSimilarity], result of:
0.1716406 = score(doc=3327,freq=2.0), product of:
0.4128091 = queryWeight, product of:
6.272122 = idf(docFreq=227, maxDocs=44421)
0.0658165 = queryNorm
0.41578686 = fieldWeight in 3327, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.272122 = idf(docFreq=227, maxDocs=44421)
0.046875 = fieldNorm(doc=3327)
0.25 = coord(1/4)
- Abstract
- MARC is an acronym for Machine Readable Catalogue or Cataloguing. This general description, howcver, is rather misleading as MARC is neither a kind of catalogue nor a method of cataloguing. In fact, MARC is a Standardformat for representing bibliographic information for handling by computer. While the MARC format was primarily designed to serve the needs of libraries, the concept has since been embraced by the wider information community as a convenient way of storing and exchanging bibliographic data. The original MARC format was developed at the Library of Congress in 1965-6 leading to a pilot project, known as MARC I, which had the aim of investigating the feasibility of producing machine-readable catalogue data. Similar work was in progress in the United Kingdom whcre the Council of the British National Bibliography had set up the BNB MARC Project with the rennt of examining the use of machine-readable data in producing the printed British National Bibliography (BNB). These parallel developments led to Anglo-American co-operation an the MARC 11 project which was initiated in 1968. MARC II was to prove instrumental in defining the concept of MARC as a communications format.
-
Wu, K.J.; Chen, M.-C.; Sun, Y.: Automatic topics discovery from hyperlinked documents (2004)
0.04
0.04291015 = product of:
0.1716406 = sum of:
0.1716406 = weight(_text_:handling in 3563) [ClassicSimilarity], result of:
0.1716406 = score(doc=3563,freq=2.0), product of:
0.4128091 = queryWeight, product of:
6.272122 = idf(docFreq=227, maxDocs=44421)
0.0658165 = queryNorm
0.41578686 = fieldWeight in 3563, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.272122 = idf(docFreq=227, maxDocs=44421)
0.046875 = fieldNorm(doc=3563)
0.25 = coord(1/4)
- Abstract
- Topic discovery is an important means for marketing, e-Business and social science studies. As well, it can be applied to various purposes, such as identifying a group with certain properties and observing the emergence and diminishment of a certain cyber community. Previous topic discovery work (J.M. Kleinberg, Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California, p. 668) requires manual judgment of usefulness of outcomes and is thus incapable of handling the explosive growth of the Internet. In this paper, we propose the Automatic Topic Discovery (ATD) method, which combines a method of base set construction, a clustering algorithm and an iterative principal eigenvector computation method to discover the topics relevant to a given query without using manual examination. Given a query, ATD returns with topics associated with the query and top representative pages for each topic. Our experiments show that the ATD method performs better than the traditional eigenvector method in terms of computation time and topic discovery quality.
-
McCallum, S.H.: Preservation metadata standards for digital resources : what we have and what we need (2005)
0.04
0.04291015 = product of:
0.1716406 = sum of:
0.1716406 = weight(_text_:handling in 5353) [ClassicSimilarity], result of:
0.1716406 = score(doc=5353,freq=2.0), product of:
0.4128091 = queryWeight, product of:
6.272122 = idf(docFreq=227, maxDocs=44421)
0.0658165 = queryNorm
0.41578686 = fieldWeight in 5353, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.272122 = idf(docFreq=227, maxDocs=44421)
0.046875 = fieldNorm(doc=5353)
0.25 = coord(1/4)
- Abstract
- A key component for the successful preservation of digital resources is going to be the metadata that enables automated preservation processes to take place. The number of digital items will preclude human handling and the fact that these resources are electronic makes them logical for computer driven preservation activities. Over the last decade there have been a number of digital repository experiments that took different approaches, developed and used different data models, and generally moved our understanding forward. This paper reports on a recent initiative, PREMIS, that builds upon concepts and experience to date. It merits careful testing to see if the metadata identified can be used generally and become a foundation for more detailed metadata. And how much more will be needed for preservation activities? Initiatives for additional technical metadata and document format registries are also discussed.