자바스크립트에서 깨끗하게 링크 목록 얻기

스크래핑할 때까지도 없지만, 사이트의 링크 일람을 살짝 취득하고 싶은 장면이 있었으므로 스크립트를 썼습니다.
실용적인 스크래핑이나 좀더 고도의 것을 요구하고 있는 분은 Chrome 확장의 Scraper등의 이용을 검토해, 부디.

사양



「예쁘게」의 내역.
  • 링크의 텍스트와 URL을 나란히 표시, 클립 보드에 복사
  • URL 중복 제거
  • 텍스트의 쓸데없는 개행은 제거한다
  • 특정 도메인으로 좁히기
  • For문은 사용하지 않는다

  • 코드



    copy() 명령을 사용하고 있으므로 Chrome 전제입니다.
    브라우저의 Console에 붙여서 실행하십시오.
    
    // 検索ワードは適宜変更してください。
    const targetLinkWords = ['www.bbc.com'];
    
    const createLinkList = (el) => {
      let existsList = [];
      let res = '';
      Array.prototype.filter.call(el, (node) => {
        // hrefの値重複とtargetLinkWordsに登録されたワードを含まない場合、除外
        if (existsList.indexOf(node.href) === -1 && 
          targetLinkWords.find((val) => {return node.href.indexOf(val) !== -1;})) {
          existsList.push(node.href);
          res = `${res}\r\n` + (node.text.trim() === '' ? 
            'テキストなし':node.text.replace(/\r?\n/g, '')) + `||${node.href}`;
        }
      });
      return res;
    };
    
    const result = createLinkList(document.querySelectorAll('a'));
    console.log(result);
    copy(result);
    
    

    결과



    BBC NEWS Tech 페이지에서 시도해 보았습니다.

    Homepage||https://www.bbc.com/
    Skip to content||https://www.bbc.com/news/technology#skip-to-content
    Accessibility Help||https://www.bbc.com/accessibility/
    Sign in||https://session.bbc.com/session?ptrt=https%3A%2F%2Fwww.bbc.com%2Fnews%2Ftechnology&context=news_gnl&userOrigin=news_gnl
    Notifications||https://www.bbc.com/news/technology#
    News||https://www.bbc.com/news
    Sport||https://www.bbc.com/sport
    Reel||https://www.bbc.com/reel
    Worklife||https://www.bbc.com/worklife
    Travel||https://www.bbc.com/travel
    Future||https://www.bbc.com/future
    Culture||https://www.bbc.com/culture
    Music||https://www.bbc.com/culture/music
    Weather||https://www.bbc.com/weather
    More||https://www.bbc.com/news/technology#orb-footer
    Video||https://www.bbc.com/news/video_and_audio/headlines
    World||https://www.bbc.com/news/world
    Asia||https://www.bbc.com/news/world/asia
    UK||https://www.bbc.com/news/uk
    Business||https://www.bbc.com/news/business
    TechTech selected||https://www.bbc.com/news/technology
    Science||https://www.bbc.com/news/science_and_environment
    Stories||https://www.bbc.com/news/stories
    Entertainment & Arts||https://www.bbc.com/news/entertainment_and_arts
    Health||https://www.bbc.com/news/health
    World News TV||https://www.bbc.com/news/world_radio_and_tv
    In Pictures||https://www.bbc.com/news/in_pictures
    Reality Check||https://www.bbc.com/news/reality_check
    Newsbeat||https://www.bbc.com/news/newsbeat
    Special Reports||https://www.bbc.com/news/special_reports
    Explainers||https://www.bbc.com/news/explainers
    Long Reads||https://www.bbc.com/news/the_reporters
    Have Your Say||https://www.bbc.com/news/have_your_say
    Africa||https://www.bbc.com/news/world/africa
    Australia||https://www.bbc.com/news/world/australia
    Europe||https://www.bbc.com/news/world/europe
    Latin America||https://www.bbc.com/news/world/latin_america
    ・・・
    

    제대로 복사되었습니다.

    좋은 웹페이지 즐겨찾기