nutch 1.3 과 solr 3.4 는 eclipse 에 통합 되 어 있 습 니 다. 실행 중인 출력 로그 입 니 다.
eclipse 에서 실행 되 는 매개 변 수 는:
crawl urls -solr http://localhost:8080/l-nutch-solr -depth 3 -topN 10
실행 중 출력 로그:
crawl started in: crawl-20111107123624
rootUrlDir = urls
threads = 10
depth = 3
solrUrl=http://localhost:8080/solr/
topN = 10
Injector: starting at 2011-11-07 12:36:25
Injector: crawlDb: crawl-20111107123624/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: finished at 2011-11-07 12:36:30, elapsed: 00:00:05
Generator: starting at 2011-11-07 12:36:30
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 10
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: crawl-20111107123624/segments/20111107123633
Generator: finished at 2011-11-07 12:36:35, elapsed: 00:00:04
Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
Fetcher: starting at 2011-11-07 12:36:35
Fetcher: segment: crawl-20111107123624/segments/20111107123633
Fetcher: threads: 10
QueueFeeder finished: total 1 records + hit by time limit :0
fetching http://www.amazon.cn/
-finishing thread FetcherThread, activeThreads=7
-finishing thread FetcherThread, activeThreads=7
-finishing thread FetcherThread, activeThreads=7
-finishing thread FetcherThread, activeThreads=6
-finishing thread FetcherThread, activeThreads=5
-finishing thread FetcherThread, activeThreads=4
-finishing thread FetcherThread, activeThreads=3
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=2
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2011-11-07 12:36:39, elapsed: 00:00:04
ParseSegment: starting at 2011-11-07 12:36:39
ParseSegment: segment: crawl-20111107123624/segments/20111107123633
ParseSegment: finished at 2011-11-07 12:36:42, elapsed: 00:00:02
CrawlDb update: starting at 2011-11-07 12:36:42
CrawlDb update: db: crawl-20111107123624/crawldb
CrawlDb update: segments: [crawl-20111107123624/segments/20111107123633]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2011-11-07 12:36:44, elapsed: 00:00:01
Generator: starting at 2011-11-07 12:36:44
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 10
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: crawl-20111107123624/segments/20111107123646
Generator: finished at 2011-11-07 12:36:48, elapsed: 00:00:04
Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
Fetcher: starting at 2011-11-07 12:36:48
Fetcher: segment: crawl-20111107123624/segments/20111107123646
Fetcher: threads: 10
QueueFeeder finished: total 10 records + hit by time limit :0
fetching http://www.amazon.cn/%E4%B8%89%E6%98%9FS5838-3G%E6%89%8B%E6%9C%BA/dp/B005KP4AFG?_encoding=UTF8&s=electronics
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
fetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005OPL41A?_encoding=UTF8&s=electronics
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
fetching http://www.amazon.cn/b?ie=UTF8&node=79553071
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
fetching http://www.amazon.cn/%E5%B0%8F%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=814224051
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=6
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
fetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-IdeaPad-Y470N-%E7%AC%94%E8%AE%B0%E6%9C%AC%E7%94%B5%E8%84%91/dp/B005LT2VIE?_encoding=UTF8&s=electronics
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
fetching http://www.amazon.cn/ThinkPad-E40-0579-A22-14-0%E8%8B%B1%E5%AF%B8%E7%AC%94%E8%AE%B0%E6%9C%AC%E7%94%B5%E8%84%91-%E9%80%81%E5%8E%9F%E8%A3%85%E5%8C%85/dp/B005LFRMVY?_encoding=UTF8&s=electronics
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640644496
now = 1320640639907
0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071
1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640644496
now = 1320640640909
0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071
1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640644496
now = 1320640641910
0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071
1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640644496
now = 1320640642911
0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071
1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640644496
now = 1320640643912
0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071
1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
fetching http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 1
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640644496
now = 1320640644913
0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640650546
now = 1320640645914
0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640650546
now = 1320640646915
0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640650546
now = 1320640647916
0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640650546
now = 1320640648918
0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640650546
now = 1320640649919
0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
fetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640655698
now = 1320640650919
0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640655698
now = 1320640651921
0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640655698
now = 1320640652923
0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640655698
now = 1320640653924
0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640655698
now = 1320640654925
0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
fetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640660855
now = 1320640655926
0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640660855
now = 1320640656927
0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640660855
now = 1320640657928
0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640660855
now = 1320640658929
0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640660855
now = 1320640659930
0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
fetching http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051
-finishing thread FetcherThread, activeThreads=9
-finishing thread FetcherThread, activeThreads=8
-finishing thread FetcherThread, activeThreads=7
-finishing thread FetcherThread, activeThreads=6
-finishing thread FetcherThread, activeThreads=5
-finishing thread FetcherThread, activeThreads=4
-finishing thread FetcherThread, activeThreads=3
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2011-11-07 12:37:43, elapsed: 00:00:55
ParseSegment: starting at 2011-11-07 12:37:43
ParseSegment: segment: crawl-20111107123624/segments/20111107123646
ParseSegment: finished at 2011-11-07 12:37:45, elapsed: 00:00:01
CrawlDb update: starting at 2011-11-07 12:37:45
CrawlDb update: db: crawl-20111107123624/crawldb
CrawlDb update: segments: [crawl-20111107123624/segments/20111107123646]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2011-11-07 12:37:47, elapsed: 00:00:01
Generator: starting at 2011-11-07 12:37:47
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 10
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: crawl-20111107123624/segments/20111107123749
Generator: finished at 2011-11-07 12:37:51, elapsed: 00:00:04
Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
Fetcher: starting at 2011-11-07 12:37:51
Fetcher: segment: crawl-20111107123624/segments/20111107123749
Fetcher: threads: 10
QueueFeeder finished: total 10 records + hit by time limit :0
fetching http://www.amazon.cn/%E8%81%94%E6%83%B3-P90W-WCDMA-%E6%95%B0%E5%AD%97%E7%A7%BB%E5%8A%A8%E7%94%B5%E8%AF%9D%E6%9C%BA-THINK%E9%BB%91/dp/B005GZ0I5G?_encoding=UTF8&s=electronics
fetching http://g-ec4.images-amazon.com/images/G/28/x-locale/common/transparent-pixel._V192562247_.gif
-activeThreads=10, spinWaiting=8, fetchQueues.totalSize=8
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=8
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
fetching http://www.amazon.cn/%E8%81%94%E6%83%B3-P90W-WCDMA-%E6%95%B0%E5%AD%97%E7%A7%BB%E5%8A%A8%E7%94%B5%E8%AF%9D%E6%9C%BA-%E7%84%89%E7%B2%89/dp/B005GZ0IC4?_encoding=UTF8&s=electronics
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
fetching http://www.amazon.cn/gp/yourstore/home
fetching http://www.amazon.cn/gp/css/homepage.html
fetching http://www.amazon.cn/%E6%89%8B%E8%A1%A8-%E6%97%B6%E9%92%9F/b?ie=UTF8&node=1953164051
-activeThreads=10, spinWaiting=8, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 1
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640683363
now = 1320640684037
0. http://www.amazon.cn/gp/registry/wishlist
1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
3. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640689186
now = 1320640685037
0. http://www.amazon.cn/gp/registry/wishlist
1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
3. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640689186
now = 1320640686039
0. http://www.amazon.cn/gp/registry/wishlist
1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
3. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640689186
now = 1320640687043
0. http://www.amazon.cn/gp/registry/wishlist
1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
3. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640689186
now = 1320640688044
0. http://www.amazon.cn/gp/registry/wishlist
1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
3. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640689186
now = 1320640689045
0. http://www.amazon.cn/gp/registry/wishlist
1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
3. http://www.amazon.cn/gp/help/customer/display.html
fetching http://www.amazon.cn/gp/registry/wishlist
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 1
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640689186
now = 1320640690047
0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640695079
now = 1320640691048
0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640695079
now = 1320640692049
0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640695079
now = 1320640693049
0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640695079
now = 1320640694051
0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640695079
now = 1320640695053
0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
2. http://www.amazon.cn/gp/help/customer/display.html
fetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640700231
now = 1320640696053
0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640700231
now = 1320640697054
0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640700231
now = 1320640698056
0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640700231
now = 1320640699057
0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640700231
now = 1320640700058
0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
1. http://www.amazon.cn/gp/help/customer/display.html
fetching http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640705384
now = 1320640701058
0. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640705384
now = 1320640702060
0. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640705384
now = 1320640703060
0. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640705384
now = 1320640704061
0. http://www.amazon.cn/gp/help/customer/display.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.amazon.cn
maxThreads = 1
inProgress = 0
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1320640705384
now = 1320640705063
0. http://www.amazon.cn/gp/help/customer/display.html
fetching http://www.amazon.cn/gp/help/customer/display.html
-finishing thread FetcherThread, activeThreads=8
-finishing thread FetcherThread, activeThreads=8
-finishing thread FetcherThread, activeThreads=7
-finishing thread FetcherThread, activeThreads=6
-finishing thread FetcherThread, activeThreads=5
-finishing thread FetcherThread, activeThreads=4
-finishing thread FetcherThread, activeThreads=3
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2011-11-07 12:38:26, elapsed: 00:00:35
ParseSegment: starting at 2011-11-07 12:38:26
ParseSegment: segment: crawl-20111107123624/segments/20111107123749
Error parsing: http://g-ec4.images-amazon.com/images/G/28/x-locale/common/transparent-pixel._V192562247_.gif: failed(2,0): Can't retrieve Tika parser for mime-type image/gif
ParseSegment: finished at 2011-11-07 12:38:28, elapsed: 00:00:01
CrawlDb update: starting at 2011-11-07 12:38:28
CrawlDb update: db: crawl-20111107123624/crawldb
CrawlDb update: segments: [crawl-20111107123624/segments/20111107123749]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2011-11-07 12:38:30, elapsed: 00:00:01
LinkDb: starting at 2011-11-07 12:38:30
LinkDb: linkdb: crawl-20111107123624/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment: file:/E:/Workspaces/workspace1/L-nutch/crawl-20111107123624/segments/20111107123633
LinkDb: adding segment: file:/E:/Workspaces/workspace1/L-nutch/crawl-20111107123624/segments/20111107123646
LinkDb: adding segment: file:/E:/Workspaces/workspace1/L-nutch/crawl-20111107123624/segments/20111107123749
LinkDb: finished at 2011-11-07 12:38:32, elapsed: 00:00:01
SolrIndexer: starting at 2011-11-07 12:38:32
SolrIndexer: finished at 2011-11-07 12:38:37, elapsed: 00:00:05
SolrDeleteDuplicates: starting at 2011-11-07 12:38:37
SolrDeleteDuplicates: Solr url: http://localhost:8080/solr/
SolrDeleteDuplicates: finished at 2011-11-07 12:38:39, elapsed: 00:00:01
crawl finished: crawl-20111107123624
캡 처 데이터 모델
1. CrawlDB 는 캡 처 메커니즘, 캡 처 상태, 웹 지문 과 메타 데 이 터 를 포함 하여 모든 url 정 보 를 저장 하 는 데 사 용 됩 니 다.
2. LinkDB, 모든 url 의 연결 닻 링크 와 닻 텍스트 저장
3. Segment, 원본 웹 페이지 내용;분석 한 웹 페이지;메타 데이터외부 링크;색인 에 사용 할 텍스트
참고:http://blog.csdn.net/amuseme_lu/article/details/5993916
이 내용에 흥미가 있습니까?
현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:
다양한 언어의 JSONJSON은 Javascript 표기법을 사용하여 데이터 구조를 레이아웃하는 데이터 형식입니다. 그러나 Javascript가 코드에서 이러한 구조를 나타낼 수 있는 유일한 언어는 아닙니다. 저는 일반적으로 '객체'{}...
텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.