Spark 시 뮬 레이 션 은 특정한 사이트 의 사용자 조회 횟수 가 가장 많은 url 통 계 를 실현 합 니 다.
19670 단어 spark
첫 번 째 필드 는 방문 날짜 이 고 두 번 째 필드 는 방문 URL 입 니 다. 그 중에서 각 항목 마다 독립 된 도 메 인 이름 이 있 습 니 다. 다음 과 같 습 니 다.
java.aaaaaaa.cn
net.aaaaaaa.cn
php.aaaaaaa.cn
20160321101954 http://java.aaaaaaa.cn/java/course/javaeeadvanced.shtml
20160321101954 http://java.aaaaaaa.cn/java/course/javaee.shtml
20160321101954 http://java.aaaaaaa.cn/java/course/android.shtml
20160321101954 http://java.aaaaaaa.cn/java/video.shtml
20160321101954 http://java.aaaaaaa.cn/java/teacher.shtml
20160321101954 http://java.aaaaaaa.cn/java/course/android.shtml
20160321101954 http://php.aaaaaaa.cn/php/teacher.shtml
20160321101954 http://net.aaaaaaa.cn/net/teacher.shtml
20160321101954 http://java.aaaaaaa.cn/java/course/hadoop.shtml
20160321101954 http://java.aaaaaaa.cn/java/course/base.shtml
20160321101954 http://net.aaaaaaa.cn/net/course.shtml
20160321101954 http://php.aaaaaaa.cn/php/teacher.shtml
20160321101954 http://net.aaaaaaa.cn/net/video.shtml
20160321101954 http://java.aaaaaaa.cn/java/course/base.shtml
20160321101954 http://net.aaaaaaa.cn/net/teacher.shtml
20160321101954 http://java.aaaaaaa.cn/java/video.shtml
20160321101954 http://java.aaaaaaa.cn/java/video.shtml
20160321101954 http://net.aaaaaaa.cn/net/video.shtml
20160321101954 http://net.aaaaaaa.cn/net/course.shtml
20160321101954 http://java.aaaaaaa.cn/java/course/javaee.shtml
20160321101954 http://java.aaaaaaa.cn/java/course/android.shtml
20160321101955 http://php.aaaaaaa.cn/php/course.shtml
20160321101955 http://net.aaaaaaa.cn/net/teacher.shtml
20160321101955 http://php.aaaaaaa.cn/php/teacher.shtml
20160321101955 http://java.aaaaaaa.cn/java/course/base.shtml
20160321101955 http://net.aaaaaaa.cn/net/teacher.shtml
20160321101955 http://java.aaaaaaa.cn/java/course/javaee.shtml
20160321101955 http://php.aaaaaaa.cn/php/video.shtml
20160321101955 http://net.aaaaaaa.cn/net/course.shtml
20160321101955 http://php.aaaaaaa.cn/php/video.shtml
20160321101955 http://java.aaaaaaa.cn/java/course/android.shtml
20160321101955 http://java.aaaaaaa.cn/java/course/javaee.shtml
20160321101955 http://java.aaaaaaa.cn/java/course/javaee.shtml
20160321101955 http://net.aaaaaaa.cn/net/video.shtml
20160321101955 http://net.aaaaaaa.cn/net/teacher.shtml
20160321101955 http://java.aaaaaaa.cn/java/teacher.shtml
20160321101955 http://java.aaaaaaa.cn/java/course/android.shtml
20160321101955 http://java.aaaaaaa.cn/java/course/javaee.shtml
20160321101955 http://java.aaaaaaa.cn/java/course/cloud.shtml
20160321101955 http://net.aaaaaaa.cn/net/video.shtml
20160321101956 http://java.aaaaaaa.cn/java/course/javaeeadvanced.shtml
20160321101956 http://net.aaaaaaa.cn/net/video.shtml
20160321101956 http://net.aaaaaaa.cn/net/video.shtml
20160321101956 http://java.aaaaaaa.cn/java/course/javaeeadvanced.shtml
20160321101956 http://java.aaaaaaa.cn/java/course/android.shtml
20160321101956 http://java.aaaaaaa.cn/java/course/hadoop.shtml
20160321101956 http://java.aaaaaaa.cn/java/course/javaee.shtml
20160321101956 http://java.aaaaaaa.cn/java/course/javaeeadvanced.shtml
20160321101956 http://php.aaaaaaa.cn/php/teacher.shtml
20160321101956 http://net.aaaaaaa.cn/net/teacher.shtml
20160321101956 http://java.aaaaaaa.cn/java/course/base.shtml
20160321101956 http://java.aaaaaaa.cn/java/course/cloud.shtml
20160321101956 http://php.aaaaaaa.cn/php/teacher.shtml
20160321101956 http://net.aaaaaaa.cn/net/course.shtml
20160321101956 http://net.aaaaaaa.cn/net/teacher.shtml
20160321101956 http://php.aaaaaaa.cn/php/video.shtml
20160321101956 http://java.aaaaaaa.cn/java/course/cloud.shtml
20160321101956 http://java.aaaaaaa.cn/java/course/cloud.shtml
20160321101956 http://java.aaaaaaa.cn/java/course/hadoop.shtml
20160321101957 http://java.aaaaaaa.cn/java/teacher.shtml
20160321101957 http://php.aaaaaaa.cn/php/teacher.shtml
20160321101957 http://net.aaaaaaa.cn/net/teacher.shtml
20160321101957 http://net.aaaaaaa.cn/net/teacher.shtml
20160321101957 http://php.aaaaaaa.cn/php/teacher.shtml
20160321101957 http://php.aaaaaaa.cn/php/course.shtml
20160321101957 http://java.aaaaaaa.cn/java/course/base.shtml
20160321101957 http://net.aaaaaaa.cn/net/course.shtml
20160321101957 http://java.aaaaaaa.cn/java/video.shtml
20160321101957 http://php.aaaaaaa.cn/php/video.shtml
20160321101957 http://net.aaaaaaa.cn/net/teacher.shtml
20160321101957 http://java.aaaaaaa.cn/java/video.shtml
20160321101957 http://net.aaaaaaa.cn/net/video.shtml
20160321101957 http://java.aaaaaaa.cn/java/course/hadoop.shtml
20160321101957 http://net.aaaaaaa.cn/net/course.shtml
20160321101957 http://java.aaaaaaa.cn/java/course/cloud.shtml
20160321101957 http://java.aaaaaaa.cn/java/course/cloud.shtml
20160321101958 http://net.aaaaaaa.cn/net/course.shtml
20160321101958 http://java.aaaaaaa.cn/java/course/hadoop.shtml
20160321101958 http://php.aaaaaaa.cn/php/video.shtml
20160321101958 http://php.aaaaaaa.cn/php/course.shtml
20160321101958 http://java.aaaaaaa.cn/java/course/cloud.shtml
20160321101958 http://net.aaaaaaa.cn/net/video.shtml
20160321101958 http://java.aaaaaaa.cn/java/course/base.shtml
필요: 각 도 메 인 이름 아래 에서 가장 많이 방문 한 세 개의 URL 을 집계 합 니 다.
코드:
import java.net.URL
import org.apache.spark.{SparkConf, SparkContext}
object UrlCount {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("UrlCount").setMaster("local[2]")
val sc = new SparkContext(conf)
val rdd1 = sc.textFile("E:\\aaaaaa.log").map(line =>{
val f = line.split("\t")
(f(1),1)
})
val rdd2 = rdd1.reduceByKey(_+_)
val rdd3 = rdd2.map(t => {
val url = t._1
val host = new URL(url).getHost
(host,url,t._2)
})
val rdd4 = rdd3.groupBy(_._1).mapValues(it =>{
it.toList.sortBy(_._3).reverse.take(3)
})
sc.stop()
}
}
이 내용에 흥미가 있습니까?
현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:
Spark 팁: 컴퓨팅 집약적인 작업을 위해 병합 후 셔플 파티션 비활성화작은 입력에서 UDAF(사용자 정의 집계 함수) 내에서 컴퓨팅 집약적인 작업을 수행할 때 spark.sql.adaptive.coalescePartitions.enabled를 false로 설정합니다. Apache Sp...
텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.