Puppeteer에서 여러 iframe을 구문 분석하려고 할 때 막힌 것

16497 단어 puppeteerNode.js

환경


  • MacBook Pro High Sierra (v10.13.3)
  • node v9.6.1
  • Typescript v2.7.2
  • Puppeteer v1.1.1

  • 개요


  • puppeteer로 iframe을 얻고 싶을 때는 Page#frames를 사용합니다.

    코드



    test.ts
    import {launch} from 'puppeteer';
    
    (async () => {
        const browser = await launch();
        const page = await browser.newPage();
    
        try {
            await page.goto('file:///tmp/test.html');
    
            let frames = page.frames();
            // frames[0]は今いるページを指しているため、ページ内のiframeを探す場合インデックスは1以降になる
            for (let i = 1; i < frames.length; i++) {
                const frame = frames[i];
                console.log(await frame.evaluate(content => content.innerHTML, await frame.$('p')));
            }
        } catch (e) {
            console.error(e);
        }
    
        await page.close();
        await browser.close();
    })();
    

    /tmp/test.html
    <!doctype html>
    <html lang="ja">
    <head>
      <meta charset="UTF-8">
      <title>Document</title>
    </head>
    <body>
      <iframe src="inner-frame.html"></iframe>
      <iframe src="secondary-frame.html"></iframe>
    </body>
    </html>
    

    /tmp/inner-frame.html
    <!DOCTYPE html>
    <html lang="ja">
    <head>
      <meta charset="UTF-8">
      <title>InnerFrame</title>
    </head>
    <body>
      <p>Inner Frame!</p>
    </body>
    </html>
    

    /tmp/secondary-frame.html
    <!DOCTYPE html>
    <html lang="ja">
    <head>
      <meta charset="UTF-8">
      <title>SecondaryFrame</title>
    </head>
    <body>
      <p>Secondary Frame!</p>
    </body>
    </html>
    

    실행 결과


    $ node test.js
    Inner Frame!
    Secondary Frame!
    

    (뱀족) 시행착오의 흔적



    일의 발단



    test.ts
    import {launch} from 'puppeteer';
    
    (async () => {
        const browser = await launch();
        const page = await browser.newPage();
    
        try {
            await page.goto('file:///tmp/test.html');
            const frames = await page.$$('iframe');
            for (let i = 0; i < frames.length; i++) {
                const url = await page.evaluate(content => content.src, frames[i]);
                await page.goto(url);
                console.log(await page.evaluate(content => content.innerHTML, await page.$('p')));
            }
        } catch (e) {
            console.error(e);
        }
    
        await page.close();
        await browser.close();
    })();
    

    실행 결과


    $ node test.js
    Inner Frame!
    Error: JSHandles can be evaluated only in the context they were created!
        at ExecutionContext.convertArgument (~~~/node_modules/puppeteer/lib/ExecutionContext.js:95:17)
        at Array.map (<anonymous>)
        at ExecutionContext.evaluateHandle (~~~/node_modules/puppeteer/lib/ExecutionContext.js:70:23)
        at ExecutionContext.evaluate (~~~/node_modules/puppeteer/lib/ExecutionContext.js:46:31)
        at Frame.evaluate (~~~/node_modules/puppeteer/lib/FrameManager.js:299:20)
        at <anonymous>
        at process._tickCallback (internal/process/next_tick.js:160:7)
    

    왠지 에러 나오고 있다…

    제1차 개수



    test.ts
    import {launch} from 'puppeteer';
    
    (async () => {
        const browser = await launch();
        const page = await browser.newPage();
    
        try {
            await page.goto('file:///tmp/test.html');
    
            const frames = await page.$$('iframe');
            const urls = [];
            for (let i = 0; i < frames.length; i++) {
                urls.push(await page.evaluate(content => content.src, frames[i]));
            }
            for (let i = 0; i < urls.length; i++) {
                await page.goto(urls[i]);
                console.log(await page.evaluate(content => content.innerHTML, await page.$('p')));
            }
        } catch (e) {
            console.error(e);
        }
    
        await page.close();
        await browser.close();
    })();
    

    실행 결과


    $ node test.js
    Inner Frame!
    Secondary Frame!
    

    < 루프 2회시키는 것도…
    그래서 거부
  • 좋은 웹페이지 즐겨찾기