python 爬取简书首页文章标题和简介

作者: fan12 | 来源:发表于2018-12-12 20:51 被阅读0次

python 爬取简书首页文章标题和简介
实战爬取简书网热评文章（基于lxml及多进程爬虫方法）
使用Scrapy框架爬取简书首页文章（Selenium）
第二课：爬虫：（俊）爬取简书漫画专栏
新手向爬虫（三）别人的爬虫在干啥
Python第二试
Python爬虫：如何爬取单页数据？
爬虫基础_03——xpath
前程无忧python岗位信息爬取和分析
使用 Python 爬取简书网的所有文章

import urllib3

urllib3.disable_warnings()

import re

url= 'https://www.jianshu.com'

headers= {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36'

}

pool= urllib3.PoolManager()

resp= pool.request('GET',url,headers=headers)

url_content= resp.data.decode()

# print(url_content)

# title =re.findall(r'

title=re.findall(r'<a class="title" target="_blank.*?">(.*?)</a>',url_content)

content= re.findall(r'<p class="abstract">(.*?)</p>',url_content,re.S)

# print(title)

j=0

for iin title:

print(i)

print(content[j])

print('=============================================================')

j+=1

urllib

from urllibimport request

import re

url= "https://www.jianshu.com"

headers= {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36'

}

req= request.Request(url,headers=headers)

resp=request.urlopen(req)

page= resp.read().decode()

# print(page)

res= re.findall(r'<a class="title" target="_blank" .*?>(.*?)</a>.*?<p class="abstract">(.*?)</p>',page,re.S)

for title,articlein res:

print(title)

print(article)

print('=====================================')