面向对象封装

在前文书写代码的时候,脚本已经实现了基本的功能,从一个HTML中提取出我们想要信息.但是,还

可封装性差

接口定义不清楚

迭代困难

Google 页面 Serp p 经常会会遇到一些小调整调整调整调整调整调整我们代码代码代码必然要经常经常进行更新更新更新更新更新我们就就需要保证保证代码代码的简洁性简洁性. 在这个项目中中我们把握个个 : : : 对象 : : 对象 : : : : : : : : : 个个个는지

那么什么是面向对象?

class GoogleSpider(object):
    def __get_total_page(self):
        pass

    def search_text(self):
        pass

    def search_videos(self):
        pass

    def search_wiki(self):
        pass
...

这这代码代码代码代码是是将我们刚刚写出的的函数封装在在一一个个个 googlespider 类中中中中看似是用用用了面向面向对象的写法写法写法写法中中的方法和数据并并并没有没有发生发生关联关联还再再进一步进一步进一步进一步进一步. 中的的的中中中中中中的的的的的的用用用了了面向对象对象的写法写法写法类中中中中中用用用用了了面向对象对象的写法写法写法类中中中中中中用用用用了了面向对象对象的写法写法写法类中中中中中中用 인

前文中,我们的输出会输出爬取内容的内容,即 type字段.

在这里,我们类中提前定义四种类型.

完整的面向对象代码,如下所示:

class GoogleSpider(AttribDict):
    __exclude_keys__ = {'soup'}

    def __init__(self, soup: BeautifulSoup):
        self.videos = []
        self.wiki = []
        self.page = 0
        self.news = []
        self.main = []
        self.soup = soup

    def __get_total_page(self):
        """获取当前页面总数"""
        pages_ = self.soup.find('span', id='xjs').findAll('td')
        maxn = 0
        for p in pages_:
            try:
                if int(p.text) > maxn:
                    maxn = int(p.text)
            except:
                pass
        self.page = maxn


    def __search_main(self):
        """解析主要搜索结果"""

        # 获取所有的主要搜索结果
        result_containers = self.soup.findAll('div', class_='g')

        for container in result_containers:
            # title提取
            try:
                title = container.find('h3').text
                # 对应链接提取
                url = container.find('a')['href']
                # 对应描述提取
                des = container.find('span', class_='aCOpRe').text
                self.main.append({
                    'title': title,
                    'url': url,
                    'des': des,
                })
            except Exception:
                continue

    def __search_wiki(self):
        """解析wiki内容"""
        container = self.soup.find('div', class_='kp-wholepage')
        # 如果container为None，则返回空列表
        if container is None:
            return []
        # Title
        title = container.find('h2', attrs={'data-attrid': 'title'}).find('span').text
        # Subtitle
        try:
            subtitle = container.find(
                'div', attrs={'data-attrid': 'subtitle'}).text
        except AttributeError:
            subtitle = None
        # Description
        des = container.find('div', class_='kno-rdesc').find('span').text
        # 获取Wiki链接
        url = container.find('div', class_='kno-rdesc').find('a')['href']
        # Details内容
        try:
            # div.wp-ms对应不同的四个card
            table = container.findAll(
                'div', class_='wp-ms')[2].findAll('tr', class_='kno-nf-nr')[1:]
        except IndexError:
            table = []
        details = []

        for row in table:
            name = row.find('span').text.strip(': ')
            detail_ = row.findAll('span')[1:]
            detail = ''
            for _ in detail_:
                detail += _.text + ' '  # 以 key value的形式输出结果
            details.append({
                'name': name,
                'detail': detail.strip()
            })
        result = {
            'title': title,
            'subtitle': subtitle,
            'des': des,
            'url': url,
            'details': details,
        }
        self.wiki = [result]


    def __search_news(self):
        try:
            cards = self.soup.find('g-scrolling-carousel').findAll('g-inner-card')
        except AttributeError:
            return []

        for card in cards:
            title = card.find('div', role='heading').text
            href = card.find('a')['href']
            result = {
                'title': title,
                'href':href,
            }
            self.news.append(result)


    def __search_videos(self):
        try:
            cards = self.soup.find('div', id='search').findAll('div', class_='VibNM')
        except AttributeError:
            return []

        for card in cards:
            title = card.find('div', role='heading').text
            href = card.find('a')['href']
            result = {
                'title': title,
                'href':href,
            }
            self.videos.append(result)


    def search(self):
        self.__get_total_page()
        self.__search_main()
        self.__search_news()
        self.__search_videos()
        self.__search_wiki()


spider = GoogleSpider(soup)
spider.search()

可以可以可以可以可以面向面向对象的之后之后之后之后最的的的就是注释变少变少变少变少变少是输入输入和和和输出都不不需要需要显示指定了了我们可以使用更加多的的设计设计模式来进一步进一步完善我们我们这个 GoogleSpider类.

Reference

이 문제에 관하여(面向对象封装), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://dev.to/fiveeng/-h0c

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

面向对象封装

Reference

좋은 웹페이지 즐겨찾기