python多協(xié)程爬蟲(chóng)示例

系統(tǒng) 2019-09-27 17:47:59 1951 0

寫(xiě)python協(xié)程時(shí)使用gevent模塊和queue模塊可以大大提高爬蟲(chóng)速度。在同時(shí)爬取多個(gè)網(wǎng)站時(shí)，原來(lái)用for循環(huán)一個(gè)網(wǎng)站一個(gè)網(wǎng)站按循序順序爬，就像先燒飯后燒菜，兩個(gè)步驟異步進(jìn)行。使用多協(xié)程可以讓爬蟲(chóng)自己選擇爬取順序，就像邊燒飯邊燒菜，兩個(gè)步驟同步進(jìn)行，速度自然快了。
不多說(shuō)了，來(lái)看下代碼吧：

            
              from gevent import monkey
monkey.patch_all()
#打上多協(xié)程布丁，下面的程序就可以執(zhí)行多協(xié)程了

import requests,gevent,csv
from gevent.queue import Queue
from bs4 import BeautifulSoup

#把所有URL都放到一個(gè)列表里：
url_list=[]
i=1
for i in range(10):
    i=i+1
    url='http://www.mtime.com/top/tv/top100/index-'+str(i)+'.html'
    url_list.append(url)
#第一個(gè)url和別的不一樣，需要單獨(dú)加入
url_0='http://www.mtime.com/top/tv/top100/'
url_list.append(url_0)

headers={
    'User-Agent': 
}

csv_file=open('時(shí)光網(wǎng)電影列表.csv','a+',newline='',encoding='utf-8')
writer=csv.writer(csv_file)
file_head=['電影名稱','導(dǎo)演','主演','簡(jiǎn)介']
writer.writerow(file_head)

def list(movies):
    for movie in movies:
        title=movie.find('h2',class_="px14 pb6").find('a').text
        acts=movie.find_all('p')
        try:
            dic=acts[0].text
        except IndexError:
            dic='none'

        try:
            actor=acts[1].text
        except IndexError:
            actor='none'

        try:
            bief=movie.find('p',class_="mt3").text
        except AttributeError:
            bief='none'
        writer.writerow([title,dic,actor,bief])

#所有url都放到‘不用等’房間里：
work=Queue()
for url in url_list:
    work.put_nowait(url)

#爬蟲(chóng)對(duì)象：
def crawler():
    while not work.empty():
        url=work.get_nowait()
        res=requests.get(url,headers=headers)
        soup=BeautifulSoup(res.text,'html.parser')
        movies=soup.find_all('div',class_="mov_con")
        list(movies)
        print(url,work.qsize(),res.status_code)        

#建立多協(xié)程任務(wù)，任務(wù)不用建太多，2個(gè)就夠，太多的話對(duì)方服務(wù)器承受不了
tasks_list=[]
for x in range(2):
    task=gevent.spawn(crawler)
    tasks_list.append(task)

gevent.joinall(tasks_list)
csv_file.close()

更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主

微信掃碼或搜索：z360901061

微信掃一掃加我為好友

QQ號(hào)聯(lián)系： 360901061

您的支持是博主寫(xiě)作最大的動(dòng)力，如果您喜歡我的文章，感覺(jué)我的文章對(duì)您有幫助，請(qǐng)用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧，狠狠點(diǎn)擊下面給點(diǎn)支持吧，站長(zhǎng)非常感激您！手機(jī)微信長(zhǎng)按不能支付解決辦法：請(qǐng)將微信支付二維碼保存到相冊(cè)，切換到微信，然后點(diǎn)擊微信右上角掃一掃功能，選擇支付二維碼完成支付。

【本文對(duì)您有幫助就好】元

2元

5元

10元

20元

自定義

發(fā)表我的評(píng)論

最新評(píng)論總共0條評(píng)論