18.scrapy中爬取的数据存储到elasticsearch中

作者: MononokeHime | 来源:发表于2018-06-14 13:06 被阅读0次

18.scrapy中爬取的数据存储到elasticsearch中
2018.8.21重磅更新！！！: FunpySpiderSea
PyMySQL操作MySQL数据库实例（爬取天气信息存入数据库）
Python爬虫 --- Scrapy爬取IT桔子网
豆瓣电影Top250数据分析
全文搜索引擎 Elasticsearch 入门教程-Index
爬取静态网页数据思路与案例
使用requests爬取实习僧网站数据
爬取基于Ajax技术网页数据
Python爬虫基础 | Windows 环境下安装MySQL-

1.我们需要在ES中建立需要保存到的索引以及type这里我们使用官网提供的python接口包elasticsearch dsl：https://github.com/elastic/elasticsearch-dsl-py。
安装

pip install elasticsearch-dsl

2.新建es_operation.py，我们使用python来新建es索引和type，而不用像之前在kibana中输入命令。我们以前面爬取简书为例：

from datetime import datetime
from elasticsearch_dsl import DocType, Date, Nested, Boolean, \
    analyzer, Completion, Keyword, Text, Integer

from elasticsearch_dsl.connections import connections

# 导入连接elasticsearch(搜索引擎)服务器方法
connections.create_connection(hosts=['127.0.0.1'])

class JianshuType(DocType):  # 自定义一个类来继承DocType类
    title = Text(analyzer="ik_max_word")
    content = Text(analyzer="ik_max_word")
    article_id = Keyword()
    origin_url = Keyword()
    avatar = Keyword()
    author = Keyword()
    pub_time = Date()
    read_count = Integer()
    like_count = Integer()
    word_count = Integer()
    subjects = Text(analyzer="ik_max_word")
    comment_count = Integer()

    class Meta:
        index = "scrapy"
        doc_type = 'jianshu'


if __name__ == "__main__":
    JianshuType.init()

此时在elasticsearch-head就可以看到创建的索引

image.png

Text类型表示需要分词，所以需要知道中文分词器，ik_max_wordwei为中文分词器；Keyword类型是不进行分词
create_connection(hosts=['127.0.0.1'])：连接elasticsearch(搜索引擎)服务器方法，可以连接多台服务器
class Meta：设置索引名称和表名称
索引类名称.init(): 生成索引和表以及字段
实例化索引类.save():将数据写入elasticsearch(搜索引擎)

3.接下来我们需要把spider返回的item保存到es中。

4.在setting中开启pipeline

ITEM_PIPELINES = {
   # 'jianshu.pipelines.JianshuPipeline': 300,
    'jianshu.pipelines.JianshuESPipeline': 300,
}

5.运行爬虫，打开head，我们发现数据开始写入到ES中去了。

image.png

网友评论

本文标题：18.scrapy中爬取的数据存储到elasticsearch中

本文链接：https://www.haomeiwen.com/subject/jwafeftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

18.scrapy中爬取的数据存储到elasticsearch中

相关文章

18.scrapy中爬取的数据存储到elasticsearch中

2018.8.21重磅更新！！！: FunpySpiderSea

PyMySQL操作MySQL数据库实例（爬取天气信息存入数据库）

Python爬虫 --- Scrapy爬取IT桔子网

豆瓣电影Top250数据分析

全文搜索引擎 Elasticsearch 入门教程-Index

爬取静态网页数据思路与案例

使用requests爬取实习僧网站数据

爬取基于Ajax技术网页数据

Python爬虫基础 | Windows 环境下安装MySQL-

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读