想了很久最终还是决定把分层爬取加上
最关键的是这几行:
#获取详情页网址 security_item['url'] = i_item.xpath(".//div[@class='row2']/h3/a/@href").extract()[0] #跳转detail_parse方法,抓取数据以后返回 yield scrapy.Request(security_item['url'],meta={'security_item':security_item},callback=self.detail_parse)
最后引入的detail_parse方法:
def detail_parse(self,response): security_item = response.meta['security_item'] security_item['detail'] = response.xpath("//div[@class='mianLeft']/div[@class='de_p']").xpath('string(.)').extract()[0] return security_item
这样就完美解决啦!
给源代码截个图吧
神龙|纯净稳定代理IP免费测试>>>>>>>>天启|企业级代理IP免费测试>>>>>>>>IPIPGO|全球住宅代理IP免费测试