Java 파충류:dom 메서드를 사용하여 Document 객체 스트리밍

76094 단어 JSoup
먼저 웹 주소를 입력하십시오.
https://wall.alphacoders.com/featured.php?lang=Chinese

주요 단계:
  • Jsoup의connect 방법을 이용하여 Document 대상을 획득
  • String html = "https://wall.alphacoders.com/featured.php?lang=Chinese";
    Document doc = Jsoup.connect(html).get();
    

    컨텐트가 너무 길면 더 이상 표시되지 않습니다.
    우리는 이 부분을 예로 들 수 있다.
    <ul class="nav nav-pills"> 
        <li><a href="https://alphacoders.com/site/about-us" rel="nofollow">About Usa>li> 
        <li><a href="https://alphacoders.com/site/faq" rel="nofollow">FAQa>li> 
        <li><a href="https://alphacoders.com/site/privacy" rel="nofollow">Privacy Policya>li> 
        <li><a href="https://alphacoders.com/site/tos" rel="nofollow">Terms Of Servicea>li> 
        <li><a href="https://alphacoders.com/site/acceptable_use" rel="nofollow">Acceptable Usea>li> 
        <li><a href="https://alphacoders.com/site/etiquette" rel="nofollow">Etiquettea>li> 
        <li><a href="https://alphacoders.com/site/advertising" rel="nofollow">Advertise With Usa>li> 
        <li><a id="change_consent">Change Consenta>li> 
    ul> 
    
  • 우리 먼저 모든ul:
  • 을 찾자.
    Elements elements = doc.getElementsByTag("ul");
    

    출력은 다음과 같습니다.
    <ul class="nav navbar-nav center"> 
     <li> <a title="Submit Wallpapers" href="https://alphacoders.com/site/submit-wallpaper"><i class="el el-circle-arrow-up">i>  a> li> 
     <li> <a href="https://alphacoders.com/contest"><i class="el el-gift">i>  a> li> 
    ul>
    <ul class="nav navbar-nav navbar-right center"> 
     <li> <a href="language.php?lang=Chinese"> <img src="https://static.alphacoders.com/wa/Chinese-flag.png" alt="Chinese-flag">        a> li> 
     <li> <a rel="nofollow" href="https://alphacoders.com/users/login"><i class="el el-user">i>  a> li> 
     <li> <a href="https://alphacoders.com/users/register"><i class="el el-edit">i>  a> li> 
    ul>
    <ul class="pagination"> 
     <li class="active"><a id="prev_page" href="#"><  a>li> 
     <li class="active"><a>1a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=2">2a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=3">3a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=4">4a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=5">5a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=6">6a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=7">7a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=8">8a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=9">9a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=10">10a>li> 
     <li><a>...a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=319">319a>li> 
     <li><a id="next_page" href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=2">  >a>li> 
    ul>
    <ul class="pagination"> 
     <li class="active"><a href="#"><  a>li> 
     <li class="active"><a>1a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=2">2a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=3">3a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=4">4a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=5">5a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=6">6a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=7">7a>li> 
     <li><a>...a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=319">319a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=2">  >a>li> 
    ul>
    <ul class="pagination"> 
     <li class="active"><a href="#"><< a>li> 
     <li class="active"><a href="#"><  a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=2">  >a>li> 
     <li><a title="  (319)" href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=319"> >>a>li> 
    ul>
    <ul class="pagination"> 
     <li class="active"><a href="#">1a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=2">2a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=3">3a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=4">4a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=5">5a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=6">6a>li> 
     <li><a href="https://wall.alphacoders.com/featured.php?lang=Chinese&page=7">7a>li> 
    ul>
    <ul class="nav nav-pills"> 
     <li><a href="https://alphacoders.com/site/about-us" rel="nofollow">About Usa>li> 
     <li><a href="https://alphacoders.com/site/faq" rel="nofollow">FAQa>li> 
     <li><a href="https://alphacoders.com/site/privacy" rel="nofollow">Privacy Policya>li> 
     <li><a href="https://alphacoders.com/site/tos" rel="nofollow">Terms Of Servicea>li> 
     <li><a href="https://alphacoders.com/site/acceptable_use" rel="nofollow">Acceptable Usea>li> 
     <li><a href="https://alphacoders.com/site/etiquette" rel="nofollow">Etiquettea>li> 
     <li><a href="https://alphacoders.com/site/advertising" rel="nofollow">Advertise With Usa>li> 
     <li><a id="change_consent">Change Consenta>li> 
    ul>
    
  • class가'nav nav-pills'인 것을 발견할 수 있는 것은 하나뿐입니다.
  • Elements elements = doc.getElementsByTag("ul");
    		//System.out.println(elements);
    Element tempElement = null;
    for(Element element : elements) {
    	if (element.className().equals("nav nav-pills")) {
    		tempElement = element;
    		//System.out.println(element.className());
    		break;
    	}
    }
    
  • 이 l을 순환하고 그 중 각 l에 있는 a의 href와rel 속성을 출력한다.
  • Elements li = tempElement.getElementsByTag("li");
    for(Element element : li) {
    	Elements element2 = element.getElementsByTag("a");
    	for(Element element3 : element2) {
    		String hrefString = element3.attr("href");
    		String relString = element3.attr("rel");
    		if(hrefString != "" && relString != "") {
    			System.out.println("href=" + hrefString + " " + "rel=" + relString);
    		}
    	}
    }
    

    최종 결과:
    href=https://alphacoders.com/site/about-us rel=nofollow
    href=https://alphacoders.com/site/faq rel=nofollow
    href=https://alphacoders.com/site/privacy rel=nofollow
    href=https://alphacoders.com/site/tos rel=nofollow
    href=https://alphacoders.com/site/acceptable_use rel=nofollow
    href=https://alphacoders.com/site/etiquette rel=nofollow
    href=https://alphacoders.com/site/advertising rel=nofollow
    

    전체 코드:
    import org.jsoup.nodes.Document;
    import org.jsoup.nodes.Element;
    import org.jsoup.select.Elements;
    import java.io.IOException;
    import org.jsoup.Jsoup;
    
    
    /** 
     * @ClassName: Jsoup_Test
     * @description: 
     * @author: KI
     * @Date: 2020 8 17   8:15:14
     */
    
    public class Jsoup_Test {
    
    	public static void main(String[] args) throws IOException {
    		// TODO  
    		
    		String html = "https://wall.alphacoders.com/featured.php?lang=Chinese";
    		Document doc = Jsoup.connect(html).get();
    		
    		System.out.println(doc);
    		Elements elements = doc.getElementsByTag("ul");
    		//System.out.println(elements);
    		Element tempElement = null;
    		for(Element element : elements) {
    			if (element.className().equals("nav nav-pills")) {
    				tempElement = element;
    				//System.out.println(element.className());
    				break;
    			}
    		}
    		System.out.println(tempElement);
    		Elements li = tempElement.getElementsByTag("li");
    		for(Element element : li) {
    			Elements element2 = element.getElementsByTag("a");
    			for(Element element3 : element2) {
    				String hrefString = element3.attr("href");
    				String relString = element3.attr("rel");
    				if(hrefString != "" && relString != "") {
    					System.out.println("href=" + hrefString + " " + "rel=" + relString);
    				}
    			}
    		}		
    
    	}
    
    }
    
    
    

    좋은 웹페이지 즐겨찾기