黄色网页视频 I 影音先锋日日狠狠久久 I 秋霞午夜毛片 I 秋霞一二三区 I 国产成人片无码视频 I 国产 精品 自在自线 I av免费观看网站 I 日本精品久久久久中文字幕5 I 91看视频 I 看全色黄大色黄女片18 I 精品不卡一区 I 亚洲最新精品 I 欧美 激情 在线 I 人妻少妇精品久久 I 国产99视频精品免费专区 I 欧美影院 I 欧美精品在欧美一区二区少妇 I av大片网站 I 国产精品黄色片 I 888久久 I 狠狠干最新 I 看看黄色一级片 I 黄色精品久久 I 三级av在线 I 69色综合 I 国产日韩欧美91 I 亚洲精品偷拍 I 激情小说亚洲图片 I 久久国产视频精品 I 国产综合精品一区二区三区 I 色婷婷国产 I 最新成人av在线 I 国产私拍精品 I 日韩成人影音 I 日日夜夜天天综合

python爬蟲---從零開始(五)pyQuery庫

系統(tǒng) 2174 0

?

什么是pyQuery:

強(qiáng)大又靈活的網(wǎng)頁解析庫。如果你覺得正則寫起來太麻煩(我不會寫正則),如果你覺得 BeautifulSoup的語法太難記,如果你熟悉JQuery的語法,那么PyQuery就是你最佳的選擇。

pyQuery的安裝pip3 install pyquery即可安裝啦。

pyQuery的基本用法:

初始化:

字符串初始化:

            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story


                <p class="title" name="dromouse">
                  <b>
                    The Dormouse's story
                  </b>
                </p>
                <p class="story">
                  Once upon a time there were three little sisters;and thier names were

                  <a  class="sister" id="link1">
                    <!-- Elsie -->                  </a>
                  <a  class="sister" id="link2">
                    Lacie
                  </a>
                   and

                  <a  class="sister" id="link3">
                    Title
                  </a>
                  ; and they lived at the boottom of a well.
                </p>
                <p class="story">
                  ...
                </p>
              
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)

            
            
              print
            
            (doc(
            
              '
            
            
              a
            
            
              '
            
            ))
          

運(yùn)行結(jié)果:

python爬蟲---從零開始(五)pyQuery庫_第1張圖片

URL初始化:

            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               URL初始化
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            = pq(
            
              '
            
            
              http://www.baidu.com
            
            
              '
            
            
              )

            
            
              print
            
            (doc(
            
              '
            
            
              input
            
            
              '
            
            ))
          

運(yùn)行結(jié)果:

文件初始化:

            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               文件初始化
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            = pq(filename=
            
              '
            
            
              baidu.html
            
            
              '
            
            
              )

            
            
              print
            
            (doc(
            
              '
            
            
              title
            
            
              '
            
            ))
          

運(yùn)行結(jié)果:

python爬蟲---從零開始(五)pyQuery庫_第2張圖片

?選擇方式和jquery一致,id、name、class都是如此,還有很多都和jquery一致。

基本CSS選擇器:

            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               Css選擇器
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story


                <p class="title" name="dromouse">
                  <b>
                    The Dormouse's story
                  </b>
                </p>
                <p class="story">
                  Once upon a time there were three little sisters;and thier names were

                  <a  class="sister" id="link1">
                    <!-- Elsie -->                  </a>
                  <a  class="sister" id="link2">
                    Lacie
                  </a>
                   and

                  <a  class="title" id="link3">
                    Title
                  </a>
                  ; and they lived at the boottom of a well.
                </p>
                <p class="story">
                  ...
                </p>
              
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)

            
            
              print
            
            (doc(
            
              '
            
            
              .title
            
            
              '
            
            ))
          

運(yùn)行結(jié)果:

查找元素:

子元素:

            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               子元素
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story


                <p class="title" name="dromouse">
                  <b>
                    The Dormouse's story
                  </b>
                </p>
                <p class="story">
                  Once upon a time there were three little sisters;and thier names were

                  <a  class="sister" id="link1">
                    <!-- Elsie -->                  </a>
                  <a  class="sister" id="link2">
                    Lacie
                  </a>
                   and

                  <a  class="title" id="link3">
                    Title
                  </a>
                  ; and they lived at the boottom of a well.
                </p>
                <p class="story">
                  ...
                </p>
              
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
items 
            
            = doc(
            
              '
            
            
              .title
            
            
              '
            
            
              )

            
            
              print
            
            
              (type(items))

            
            
              print
            
            
              (items)
p 
            
            = items.find(
            
              '
            
            
              b
            
            
              '
            
            
              )

            
            
              print
            
            
              (type(p))

            
            
              print
            
            (p)
          

該代碼為查找id為title的標(biāo)簽,我們可以看到id為title的標(biāo)簽有兩個一個是p標(biāo)簽,一個是a標(biāo)簽,然后我們再使用find方法,查找出我們需要的p標(biāo)簽,運(yùn)行結(jié)果:

python爬蟲---從零開始(五)pyQuery庫_第3張圖片

這里需要注意的是,我們所使用的find是查找每一個元素內(nèi)部的標(biāo)簽.

children:

            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               子元素
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story


                <p class="title" name="dromouse">
                  <b>
                    The Dormouse's story
                  </b>
                </p>
                <p class="story">
                  Once upon a time there were three little sisters;and thier names were

                  <a  class="sister" id="link1">
                    <!-- Elsie -->                  </a>
                  <a  class="sister" id="link2">
                    Lacie
                  </a>
                   and

                  <a  class="title" id="link3">
                    Title
                  </a>
                  ; and they lived at the boottom of a well.
                </p>
                <p class="story">
                  ...
                </p>
              
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
items 
            
            = doc(
            
              '
            
            
              .title
            
            
              '
            
            
              )

            
            
              print
            
            (items.children())
          

運(yùn)行結(jié)果:

python爬蟲---從零開始(五)pyQuery庫_第4張圖片

也可以在children()內(nèi)添加選擇器條件:

            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               子元素
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story


                <p class="title" name="dromouse">
                  <b>
                    The Dormouse's story
                  </b>
                </p>
                <p class="story">
                  Once upon a time there were three little sisters;and thier names were

                  <a  class="sister" id="link1">
                    <!-- Elsie -->                  </a>
                  <a  class="sister" id="link2">
                    Lacie
                  </a>
                   and

                  <a  class="title" id="link3">
                    Title
                  </a>
                  ; and they lived at the boottom of a well.
                </p>
                <p class="story">
                  ...
                </p>
              
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
items 
            
            = doc(
            
              '
            
            
              .title
            
            
              '
            
            
              )

            
            
              print
            
            (items.children(
            
              '
            
            
              b
            
            
              '
            
            ))
          

輸出結(jié)果和上面的一致。

?父元素:

            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               子元素
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story


                <p class="title" name="dromouse">
                  <b>
                    The Dormouse's story
                  </b>
                </p>
                <p class="story">
                  Once upon a time there were three little sisters;and thier names were

                  <a  class="sister" id="link1">
                    <!-- Elsie -->                  </a>
                  <a  class="sister" id="link2">
                    Lacie
                  </a>
                   and

                  <a  class="title" id="link3">
                    Title
                  </a>
                  ; and they lived at the boottom of a well.
                </p>
                <p class="story">
                  ...
                </p>
              
            
            
              """
            
            
              from
            
             pyquery 
            
              import
            
            
               PyQuery as pq
doc 
            
            =
            
               pq(html)
items 
            
            = doc(
            
              '
            
            
              #link1
            
            
              '
            
            
              )

            
            
              print
            
            
              (items)

            
            
              print
            
            (items.parent())
          

運(yùn)行結(jié)果:

python爬蟲---從零開始(五)pyQuery庫_第5張圖片

這里只輸出一個父元素。這里我們用parents方法會給予我們返回所有父元素,祖先元素

            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               祖先元素
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
              
              

Once upo a time were three little sister;and theru name were Elsie Lacie and Title Title

...

""" from pyquery import PyQuery as pq doc = pq(html) items = doc( ' #link1 ' ) print (items) print (items.parents( ' body ' ))

運(yùn)行結(jié)果:

python爬蟲---從零開始(五)pyQuery庫_第6張圖片

兄弟元素:

            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               兄弟元素
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
              
              

Once upo a time were three little sister;and theru name were Elsie Lacie and Title Title

...

""" from pyquery import PyQuery as pq doc = pq(html) items = doc( ' #link1 ' ) print (items) print (items.siblings( ' #link2 ' ))

運(yùn)行結(jié)果:

python爬蟲---從零開始(五)pyQuery庫_第7張圖片

上面就把查找元素的方法都說了,下面我來看一下如何遍歷元素。

遍歷

            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               兄弟元素
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
              
              

Once upo a time were three little sister;and theru name were Elsie Lacie and Title Title

...

""" from pyquery import PyQuery as pq doc = pq(html) items = doc( ' a ' ) for k,v in enumerate(items.items()): print (k,v)

運(yùn)行結(jié)果:

python爬蟲---從零開始(五)pyQuery庫_第8張圖片

?獲取信息:

  獲取屬性:

            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               獲取屬性
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
              
              

Once upo a time were three little sister;and theru name were Elsie Lacie and Title Title

...

""" from pyquery import PyQuery as pq doc = pq(html) items = doc( ' a ' ) print (items) print (items.attr( ' href ' )) print (items.attr.href)

運(yùn)行結(jié)果:

python爬蟲---從零開始(五)pyQuery庫_第9張圖片

獲得文本:

            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               獲取屬性
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
              
              

Once upo a time were three little sister;and theru name were Elsie Lacie and Title Title

...

""" from pyquery import PyQuery as pq doc = pq(html) items = doc( ' a ' ) print (items) print (items.text()) print (type(items.text()))

運(yùn)行結(jié)果:

python爬蟲---從零開始(五)pyQuery庫_第10張圖片

 獲得HTML:

            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               獲取屬性
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
              
              

Once upo a time were three little sister;and theru name were Elsie Lacie and Title Title

...

""" from pyquery import PyQuery as pq doc = pq(html) items = doc( ' a ' ) print (items.html())

運(yùn)行結(jié)果:

python爬蟲---從零開始(五)pyQuery庫_第11張圖片

DOM操作:

addClass、removeClass

            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               DOM操作,addClass、removeClass
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
              
              

Once upo a time were three little sister;and theru name were Elsie Lacie and Title Title

...

""" from pyquery import PyQuery as pq doc = pq(html) items = doc( ' #link2 ' ) print (items) items.addClass( ' addStyle ' ) # add_class print (items) items.remove_class( ' sister ' ) # removeClass print (items)

運(yùn)行結(jié)果:

python爬蟲---從零開始(五)pyQuery庫_第12張圖片

attr、css:

            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               DOM操作,attr,css
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
              
              

Once upo a time were three little sister;and theru name were Elsie Lacie and Title Title

...

""" from pyquery import PyQuery as pq doc = pq(html) items = doc( ' #link2 ' ) items.attr( ' name ' , ' addname ' ) print (items) items.css( ' width ' , ' 100px ' ) print (items)

可以給予新的屬性,如果原來有該屬性,會覆蓋掉原有的屬性

運(yùn)行結(jié)果:

python爬蟲---從零開始(五)pyQuery庫_第13張圖片

remove:

            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               DOM操作,remove
            
            
              
html 
            
            = 
            
              """
            
            
              
Hello World

This is a paragraph.

""" from pyquery import PyQuery as pq doc = pq(html) wrap = doc( ' .wrap ' ) print (wrap.text()) wrap.find( ' p ' ).remove() print ( " remove以后的數(shù)據(jù) " ) print (wrap)

運(yùn)行結(jié)果:

python爬蟲---從零開始(五)pyQuery庫_第14張圖片

還有很多其他的DOM方法,想了解更多的小伙伴可以閱讀其官方文檔,地址:https://pyquery.readthedocs.io/en/latest/api.html

偽類選擇器:

            
              #
            
            
              !/usr/bin/env python
            
            
              
#
            
            
               -*- coding: utf-8 -*-
            
            
              
#
            
            
               DOM操作,偽類選擇器
            
            
              
html 
            
            = 
            
              """
            
            
              
                The Dormouse's story
              
              

Once upo a time were three little sister;and theru name were Elsie Lacie and Title Title

...

""" from pyquery import PyQuery as pq doc = pq(html) # print(doc) wrap = doc( ' a:first-child ' ) # 第一個標(biāo)簽 print (wrap) wrap = doc( ' a:last-child ' ) # 最后一個標(biāo)簽 print (wrap) wrap = doc( ' a:nth-child(2) ' ) # 第二個標(biāo)簽 print (wrap) wrap = doc( ' a:gt(2) ' ) # 比2大的索引 標(biāo)簽 即為 0 1 2 3 4 從0開始的 不是1 print (wrap) wrap = doc( ' a:nth-child(2n) ' ) # 第 2的整數(shù)倍 個標(biāo)簽 print (wrap) wrap = doc( ' a:contains(Lacie) ' ) # 包含Lacie文本的標(biāo)簽 print (wrap)

這里不在詳細(xì)的一一列舉了,了解更多CSS選擇器可以查看官方文檔,由W3C提供地址:http://www.w3school.com.cn/css/index.asp

到這里我們就把pyQuery的使用方法大致的說完了,想了解更多,更詳細(xì)的可以閱讀官方文檔,地址:https://pyquery.readthedocs.io/en/latest/

上述代碼地址:https://gitee.com/dwyui/pyQuery.git

感謝大家的閱讀,不正確的地方,還希望大家來斧正,鞠躬,謝謝。


更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主

微信掃碼或搜索:z360901061

微信掃一掃加我為好友

QQ號聯(lián)系: 360901061

您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點(diǎn)擊下面給點(diǎn)支持吧,站長非常感激您!手機(jī)微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點(diǎn)擊微信右上角掃一掃功能,選擇支付二維碼完成支付。

【本文對您有幫助就好】

您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描上面二維碼支持博主2元、5元、10元、自定義金額等您想捐的金額吧,站長會非常 感謝您的哦!!!

發(fā)表我的評論
最新評論 總共0條評論