Python 爬虫案例(七)

819次阅读
没有评论

Pycharm爬虫连接SQL Server数据库
接上一篇,爬取的网站还是:https://xxgk.eic.sh.cn/jsp/view/eiaReportList.jsp 在test.py爬虫脚本中编写好需要爬取的数据:

import scrapy from Agnes_test1.items import AgnesTest1Item import re

class KeywordSpider(scrapy.Spider): name = 'test' start_urls = ['https://xxgk.eic.sh.cn/jsp/view/eiaReportList.jsp']

def parse(self, response): result_list = response.xpath("//*[@id='menu']/following-sibling::table/tr") for result in result_list: item=AgnesTest1Item() title= result.xpath("./td[2]/text()").extract_first() department= result.xpath("./td[4]/text()").extract_first() address = result.xpath("./td[6]/text()").extract_first() type = result.xpath("./td[7]/text()").extract_first() startDate= result.xpath("./td[8]/text()").extract_first() endDate = result.xpath("./td[9]/text()").extract_first() if title: item['title'] = re.sub("\xa0","",title) item['department']=re.sub("\xa0","",department) item['address'] =re.sub("\xa0","",address) item['type']=re.sub("\xa0","",type) item['startDate']=re.sub("\xa0","",startDate) item['endDate']=re.sub("\xa0","",endDate) yield item

接下来是pipelines.py,在这里将爬取的数据写入到SQL server数据库中:

import pymssql

class AgnesTest1Pipeline: def open_spider(self, spider): #数据直接导入sql server数据库 self.conn = pymssql.connect(host='localhost', port='1434', user='sa', password='123', database='agnes',charset='utf8') self.cursor = self.conn.cursor() print("数据库连接成功")

def process_item(self, item, spider): try: self.cursor.execute( "INSERT INTO dbo.project(title,department,address,type,startDate,endDate) VALUES (%s,%s,%s,%s,%s,%s)", (item['title'], item['department'], item['address'], item['type'], item['startDate'], item['endDate']))

self.conn.commit() print("数据插入成功") except Exception as ex: print(ex) return item

def close_spider(self, spider): self.conn.close()

最后是settings.py:

ITEM_PIPELINES = { 'Agnes_test1.pipelines.AgnesTest1Pipeline': 100, }

然后先别着急运行,先打开SQL server Management Studio,在数据库中新建一个数据库,我建的数据库名字为:agnes,之后在库中建立一个表格名字为project,添加列名,最后左列就成了这样(列的属性瞎选的,请勿吐槽):
Python
最后在回到pycharm界面,在terminal中运行:scrapy crawl test,打开数据库查询,导入成功:
Python

神龙|纯净稳定代理IP免费测试>>>>>>>>天启|企业级代理IP免费测试>>>>>>>>IPIPGO|全球住宅代理IP免费测试

相关文章:

版权声明:Python教程2022-10-28发表,共计2138字。
新手QQ群:570568346,欢迎进群讨论 Python51学习