Document (#31978)

Author
Khare, R.
Cutting, D.
Sitaker, K.
Rifkin, A.
Title
Nutch: a flexible and scalable open-source Web search engine
Source
http://wiki.commerce.net/images/0/06/CN-TR-04-04.pdf
Year
2004
Series
CommerceNet Labs Technical Report 04-04
Abstract
Nutch is an open-source Web search engine that can be used at global, local, and even personal scale. Its initial design goal was to enable a transparent alternative for global Web search in the public interest - one of its signature features is the ability to "explain" its result rankings. Recent work has emphasized how it can also be used for intranets; by local communities with richer data models, such as the Creative Commons metadata-enabled search for licensed content; on a personal scale to index a user's files, email, and web-surfing history; and we also report on several other research projects built on Nutch. In this paper, we present how the architecture of the Nutch system enables it to be more flexible and scalable than other comparable systems today.
Content
Vgl. auch: www.nutch.org
Theme
Suchmaschinen
Object
Nutch

Similar documents (content)

  1. Brin, S.; Page, L.: ¬The anatomy of a large-scale hypertextual Web search engine (1998) 0.09
    0.09349658 = sum of:
      0.09349658 = product of:
        0.4674829 = sum of:
          0.015783804 = weight(abstract_txt:also in 1947) [ClassicSimilarity], result of:
            0.015783804 = score(doc=1947,freq=1.0), product of:
              0.07438621 = queryWeight, product of:
                3.3949955 = idf(docFreq=4049, maxDocs=44421)
                0.021910548 = queryNorm
              0.21218722 = fieldWeight in 1947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3949955 = idf(docFreq=4049, maxDocs=44421)
                0.0625 = fieldNorm(doc=1947)
          0.066417545 = weight(abstract_txt:comparable in 1947) [ClassicSimilarity], result of:
            0.066417545 = score(doc=1947,freq=1.0), product of:
              0.1538851 = queryWeight, product of:
                1.0170377 = boost
                6.905677 = idf(docFreq=120, maxDocs=44421)
                0.021910548 = queryNorm
              0.4316048 = fieldWeight in 1947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.905677 = idf(docFreq=120, maxDocs=44421)
                0.0625 = fieldNorm(doc=1947)
          0.13196309 = weight(abstract_txt:scale in 1947) [ClassicSimilarity], result of:
            0.13196309 = score(doc=1947,freq=4.0), product of:
              0.19303364 = queryWeight, product of:
                1.6109062 = boost
                5.4690194 = idf(docFreq=508, maxDocs=44421)
                0.021910548 = queryNorm
              0.6836274 = fieldWeight in 1947, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.4690194 = idf(docFreq=508, maxDocs=44421)
                0.0625 = fieldNorm(doc=1947)
          0.13518739 = weight(abstract_txt:engine in 1947) [ClassicSimilarity], result of:
            0.13518739 = score(doc=1947,freq=4.0), product of:
              0.19616526 = queryWeight, product of:
                1.6239208 = boost
                5.5132036 = idf(docFreq=486, maxDocs=44421)
                0.021910548 = queryNorm
              0.68915045 = fieldWeight in 1947, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.5132036 = idf(docFreq=486, maxDocs=44421)
                0.0625 = fieldNorm(doc=1947)
          0.118131064 = weight(abstract_txt:search in 1947) [ClassicSimilarity], result of:
            0.118131064 = score(doc=1947,freq=9.0), product of:
              0.17239444 = queryWeight, product of:
                2.1529324 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.021910548 = queryNorm
              0.6852371 = fieldWeight in 1947, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.0625 = fieldNorm(doc=1947)
        0.2 = coord(5/25)
    
  2. Yilmaz, T.; Ozcan, R.; Altingovde, I.S.; Ulusoy, Ö.: Improving educational web search for question-like queries through subject classification (2019) 0.09
    0.09138259 = sum of:
      0.09138259 = product of:
        0.38076082 = sum of:
          0.015783804 = weight(abstract_txt:also in 41) [ClassicSimilarity], result of:
            0.015783804 = score(doc=41,freq=1.0), product of:
              0.07438621 = queryWeight, product of:
                3.3949955 = idf(docFreq=4049, maxDocs=44421)
                0.021910548 = queryNorm
              0.21218722 = fieldWeight in 41, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3949955 = idf(docFreq=4049, maxDocs=44421)
                0.0625 = fieldNorm(doc=41)
          0.06548355 = weight(abstract_txt:rankings in 41) [ClassicSimilarity], result of:
            0.06548355 = score(doc=41,freq=1.0), product of:
              0.15243901 = queryWeight, product of:
                1.0122478 = boost
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.021910548 = queryNorm
              0.4295721 = fieldWeight in 41, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.0625 = fieldNorm(doc=41)
          0.024850173 = weight(abstract_txt:other in 41) [ClassicSimilarity], result of:
            0.024850173 = score(doc=41,freq=2.0), product of:
              0.07990261 = queryWeight, product of:
                1.0364164 = boost
                3.5186288 = idf(docFreq=3578, maxDocs=44421)
                0.021910548 = queryNorm
              0.31100577 = fieldWeight in 41, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5186288 = idf(docFreq=3578, maxDocs=44421)
                0.0625 = fieldNorm(doc=41)
          0.053385783 = weight(abstract_txt:source in 41) [ClassicSimilarity], result of:
            0.053385783 = score(doc=41,freq=1.0), product of:
              0.1676107 = queryWeight, product of:
                1.5010829 = boost
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.021910548 = queryNorm
              0.3185106 = fieldWeight in 41, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.0625 = fieldNorm(doc=41)
          0.117075704 = weight(abstract_txt:engine in 41) [ClassicSimilarity], result of:
            0.117075704 = score(doc=41,freq=3.0), product of:
              0.19616526 = queryWeight, product of:
                1.6239208 = boost
                5.5132036 = idf(docFreq=486, maxDocs=44421)
                0.021910548 = queryNorm
              0.5968218 = fieldWeight in 41, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.5132036 = idf(docFreq=486, maxDocs=44421)
                0.0625 = fieldNorm(doc=41)
          0.104181804 = weight(abstract_txt:search in 41) [ClassicSimilarity], result of:
            0.104181804 = score(doc=41,freq=7.0), product of:
              0.17239444 = queryWeight, product of:
                2.1529324 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.021910548 = queryNorm
              0.6043223 = fieldWeight in 41, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.0625 = fieldNorm(doc=41)
        0.24 = coord(6/25)
    
  3. Wakeling, S.; Clough, P.; Connaway, L.S.; Sen, B.; Tomás, D.: Users and uses of a global union catalog : a mixed-methods study of WorldCat.org (2017) 0.09
    0.08681317 = sum of:
      0.08681317 = product of:
        0.36172155 = sum of:
          0.015783804 = weight(abstract_txt:also in 4794) [ClassicSimilarity], result of:
            0.015783804 = score(doc=4794,freq=1.0), product of:
              0.07438621 = queryWeight, product of:
                3.3949955 = idf(docFreq=4049, maxDocs=44421)
                0.021910548 = queryNorm
              0.21218722 = fieldWeight in 4794, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3949955 = idf(docFreq=4049, maxDocs=44421)
                0.0625 = fieldNorm(doc=4794)
          0.017571727 = weight(abstract_txt:other in 4794) [ClassicSimilarity], result of:
            0.017571727 = score(doc=4794,freq=1.0), product of:
              0.07990261 = queryWeight, product of:
                1.0364164 = boost
                3.5186288 = idf(docFreq=3578, maxDocs=44421)
                0.021910548 = queryNorm
              0.2199143 = fieldWeight in 4794, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5186288 = idf(docFreq=3578, maxDocs=44421)
                0.0625 = fieldNorm(doc=4794)
          0.065981545 = weight(abstract_txt:scale in 4794) [ClassicSimilarity], result of:
            0.065981545 = score(doc=4794,freq=1.0), product of:
              0.19303364 = queryWeight, product of:
                1.6109062 = boost
                5.4690194 = idf(docFreq=508, maxDocs=44421)
                0.021910548 = queryNorm
              0.3418137 = fieldWeight in 4794, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4690194 = idf(docFreq=508, maxDocs=44421)
                0.0625 = fieldNorm(doc=4794)
          0.09559191 = weight(abstract_txt:engine in 4794) [ClassicSimilarity], result of:
            0.09559191 = score(doc=4794,freq=2.0), product of:
              0.19616526 = queryWeight, product of:
                1.6239208 = boost
                5.5132036 = idf(docFreq=486, maxDocs=44421)
                0.021910548 = queryNorm
              0.48730296 = fieldWeight in 4794, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5132036 = idf(docFreq=486, maxDocs=44421)
                0.0625 = fieldNorm(doc=4794)
          0.09858958 = weight(abstract_txt:global in 4794) [ClassicSimilarity], result of:
            0.09858958 = score(doc=4794,freq=2.0), product of:
              0.20024516 = queryWeight, product of:
                1.6407212 = boost
                5.570241 = idf(docFreq=459, maxDocs=44421)
                0.021910548 = queryNorm
              0.49234438 = fieldWeight in 4794, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.570241 = idf(docFreq=459, maxDocs=44421)
                0.0625 = fieldNorm(doc=4794)
          0.068203 = weight(abstract_txt:search in 4794) [ClassicSimilarity], result of:
            0.068203 = score(doc=4794,freq=3.0), product of:
              0.17239444 = queryWeight, product of:
                2.1529324 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.021910548 = queryNorm
              0.39562184 = fieldWeight in 4794, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.0625 = fieldNorm(doc=4794)
        0.24 = coord(6/25)
    
  4. Petrelli, D.; Lanfranchi, V.; Ciravegna, F.; Begdev, R.; Chapman, S.: Highly focused document retrieval in aerospace engineering : user interaction design and evaluation (2011) 0.08
    0.07977523 = sum of:
      0.07977523 = product of:
        0.3323968 = sum of:
          0.013810827 = weight(abstract_txt:also in 535) [ClassicSimilarity], result of:
            0.013810827 = score(doc=535,freq=1.0), product of:
              0.07438621 = queryWeight, product of:
                3.3949955 = idf(docFreq=4049, maxDocs=44421)
                0.021910548 = queryNorm
              0.18566382 = fieldWeight in 535, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3949955 = idf(docFreq=4049, maxDocs=44421)
                0.0546875 = fieldNorm(doc=535)
          0.021743901 = weight(abstract_txt:other in 535) [ClassicSimilarity], result of:
            0.021743901 = score(doc=535,freq=2.0), product of:
              0.07990261 = queryWeight, product of:
                1.0364164 = boost
                3.5186288 = idf(docFreq=3578, maxDocs=44421)
                0.021910548 = queryNorm
              0.27213004 = fieldWeight in 535, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5186288 = idf(docFreq=3578, maxDocs=44421)
                0.0546875 = fieldNorm(doc=535)
          0.072072364 = weight(abstract_txt:personal in 535) [ClassicSimilarity], result of:
            0.072072364 = score(doc=535,freq=2.0), product of:
              0.17762953 = queryWeight, product of:
                1.5452949 = boost
                5.246269 = idf(docFreq=635, maxDocs=44421)
                0.021910548 = queryNorm
              0.40574542 = fieldWeight in 535, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.246269 = idf(docFreq=635, maxDocs=44421)
                0.0546875 = fieldNorm(doc=535)
          0.059144482 = weight(abstract_txt:engine in 535) [ClassicSimilarity], result of:
            0.059144482 = score(doc=535,freq=1.0), product of:
              0.19616526 = queryWeight, product of:
                1.6239208 = boost
                5.5132036 = idf(docFreq=486, maxDocs=44421)
                0.021910548 = queryNorm
              0.30150333 = fieldWeight in 535, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5132036 = idf(docFreq=486, maxDocs=44421)
                0.0546875 = fieldNorm(doc=535)
          0.08858172 = weight(abstract_txt:flexible in 535) [ClassicSimilarity], result of:
            0.08858172 = score(doc=535,freq=1.0), product of:
              0.25678837 = queryWeight, product of:
                1.8579818 = boost
                6.30784 = idf(docFreq=219, maxDocs=44421)
                0.021910548 = queryNorm
              0.34496 = fieldWeight in 535, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.30784 = idf(docFreq=219, maxDocs=44421)
                0.0546875 = fieldNorm(doc=535)
          0.07704349 = weight(abstract_txt:search in 535) [ClassicSimilarity], result of:
            0.07704349 = score(doc=535,freq=5.0), product of:
              0.17239444 = queryWeight, product of:
                2.1529324 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.021910548 = queryNorm
              0.4469024 = fieldWeight in 535, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.0546875 = fieldNorm(doc=535)
        0.24 = coord(6/25)
    
  5. Zhitomirsky-Geffet, M.; Bar-Ilan, J.; Levene, M.: Analysis of change in users' assessment of search results over time (2017) 0.08
    0.07935347 = sum of:
      0.07935347 = product of:
        0.39676732 = sum of:
          0.06548355 = weight(abstract_txt:rankings in 4593) [ClassicSimilarity], result of:
            0.06548355 = score(doc=4593,freq=1.0), product of:
              0.15243901 = queryWeight, product of:
                1.0122478 = boost
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.021910548 = queryNorm
              0.4295721 = fieldWeight in 4593, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.0625 = fieldNorm(doc=4593)
          0.081203684 = weight(abstract_txt:local in 4593) [ClassicSimilarity], result of:
            0.081203684 = score(doc=4593,freq=2.0), product of:
              0.17595103 = queryWeight, product of:
                1.5379765 = boost
                5.221423 = idf(docFreq=651, maxDocs=44421)
                0.021910548 = queryNorm
              0.46151295 = fieldWeight in 4593, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.221423 = idf(docFreq=651, maxDocs=44421)
                0.0625 = fieldNorm(doc=4593)
          0.11428338 = weight(abstract_txt:scale in 4593) [ClassicSimilarity], result of:
            0.11428338 = score(doc=4593,freq=3.0), product of:
              0.19303364 = queryWeight, product of:
                1.6109062 = boost
                5.4690194 = idf(docFreq=508, maxDocs=44421)
                0.021910548 = queryNorm
              0.5920387 = fieldWeight in 4593, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4690194 = idf(docFreq=508, maxDocs=44421)
                0.0625 = fieldNorm(doc=4593)
          0.067593694 = weight(abstract_txt:engine in 4593) [ClassicSimilarity], result of:
            0.067593694 = score(doc=4593,freq=1.0), product of:
              0.19616526 = queryWeight, product of:
                1.6239208 = boost
                5.5132036 = idf(docFreq=486, maxDocs=44421)
                0.021910548 = queryNorm
              0.34457523 = fieldWeight in 4593, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5132036 = idf(docFreq=486, maxDocs=44421)
                0.0625 = fieldNorm(doc=4593)
          0.068203 = weight(abstract_txt:search in 4593) [ClassicSimilarity], result of:
            0.068203 = score(doc=4593,freq=3.0), product of:
              0.17239444 = queryWeight, product of:
                2.1529324 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.021910548 = queryNorm
              0.39562184 = fieldWeight in 4593, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.0625 = fieldNorm(doc=4593)
        0.2 = coord(5/25)