异步爬小说

864次阅读

没有评论

异步的最好案例应该是爬这种多页面的或者多级的东西，所以直接整小说

直接百度小说

随便点一本小说，然后打开调试

异步爬小说

找到加载章节的数据包

然后保存这个包的url地址

点一个章节进去，然后又看一下包

异步爬小说

看到了内容，url也复制一下

# https://dushu.baidu.com/api/pc/getCatalog?data={%22book_id%22:%224356290733%22}
# https://dushu.baidu.com/api/pc/getChapterContent?data={%22book_id%22:%224356290733%22,%22cid%22:%224356290733|1569830905%22,%22need_bookinfo%22:1}

发现有%22这玩意，直接去掉

异步爬小说

然后对第一个url进行请求

def getCatalog(url):   # url 传参
    resp = requests.get(url)
    print(resp.text)  # 同步以获取小说信息


if __name__ == '__main__':
    bok_id = "4356290733"
    url = 'https://xxxxx.com/api/pc/getCatalog?data={"book_id":"' + bok_id + '"}'
    getCatalog(url)

异步爬小说

因为是数据包，所以直接用json输出，然后定位到title的位置，以获取title和cid

异步爬小说

因为每个cid对应每一章节，所以开始上异步

修改代码下

async def aiodownload(cid,bok_id,title):
    data = {"book_id":"4356290733","cid":"4356290733|1569830905","need_bookinfo":1}
    pass
async def getCatalog(url):
    resp = requests.get(url)
    # print(resp.text)
    dic = resp.json()
    tasks = []
    for item in dic['data']['novel']['items']:
        title = item['title']
        cid = item['cid']
        # 准备异步任务
        tasks.append(aiodownload(cid,bok_id,title))
        # print(title,cid)

    await asyncio.wait(tasks)

if __name__ == '__main__':
    bok_id = "4356290733"
    url = 'https://dushu.baidu.com/api/pc/getCatalog?data={"book_id":"' + bok_id + '"}'
    asyncio.run(getCatalog(url))

仔细看一下data那一行的数据，可以改成

data = {"book_id":bok_id,"cid":f"{bok_id}|{cid}","need_bookinfo":1}

对于第一个url就是获取所有章数的，就是如下所示了

async def aiodownload(cid,bok_id,title):
    data = {"book_id":bok_id,"cid":bok_id|cid,"need_bookinfo":1}
    data = json.dumps(data)  # 改为字符串
    url = f'https://xxxx.com/api/pc/getChapterContent?data={data}'

    async with aiohttp.ClientSession() as sesion:
        async with sesion.get(url) as resp:
            dic = await resp.json()
            


async def getCatalog(url):
    resp = requests.get(url)
    # print(resp.text)
    dic = resp.json()
    tasks = []
    for item in dic['data']['novel']['items']:
        title = item['title']
        cid = item['cid']
        # 准备异步任务
        tasks.append(aiodownload(cid,bok_id,title))
        # print(title,cid)

    await asyncio.wait(tasks)

if __name__ == '__main__':
    bok_id = "4356290733"
    url = 'https://xxx.xxx.com/api/pc/getCatalog?data={"book_id":"' + bok_id + '"}'
    asyncio.run(getCatalog(url))

内容就简单了

 dic = await resp.json()
            async with aiofiles.open(title,mode="w",encoding="utf-8") as f:
                await f.write(dic['data']['novel']['content']) # 把小说内容写入

全部代码如下

import asyncio, aiohttp, aiofiles
import json
import requests


async def aiodownload(cid, bok_id, title):
    data = {"book_id": bok_id, "cid": f"{bok_id}|{cid}", "need_bookinfo": 1}
    data = json.dumps(data)  # 改为字符串
    url = f'https://xxxx.com/api/pc/getChapterContent?data={data}'
    async with aiohttp.ClientSession() as sesion:
        async with sesion.get(url) as resp:
            dic = await resp.json()
            async with aiofiles.open(title, mode="w", encoding="utf-8") as f:
                await f.write(dic['data']['novel']['content'])  # 把小说内容写入


async def getCatalog(url):
    resp = requests.get(url)
    # print(resp.text)
    dic = resp.json()
    tasks = []
    for item in dic['data']['novel']['items']:
        title = item['title']
        cid = item['cid']
        # 准备异步任务
        tasks.append(aiodownload(cid, bok_id, title))
        # print(title,cid)
    await asyncio.wait(tasks)


if __name__ == '__main__':
    bok_id = "4356290733"
    url = 'https://xxx.com/api/pc/getCatalog?data={"book_id":"' + bok_id + '"}'
    asyncio.run(getCatalog(url))

神龙|纯净稳定代理IP免费测试>>>>>>>>天启|企业级代理IP免费测试>>>>>>>>IPIPGO|全球住宅代理IP免费测试

发表于：Python爬虫

2022-10-19

# Python爬虫

复制链接

赏

异步爬小说

相关文章：

HTTP代理设置详解：一步步配置指南

什么是Socks5代理IP及其优势

Socks5代理配置教程及注意事项

什么是代理服务器IP：如何选择合适的

国外代理服务器的优势及选择建议

如何找到可靠的免费代理服务器

在线代理服务器的使用与推荐

HTTP代理服务器的设置及应用实例

静态代理IP怎么填写：步骤与示例

海外静态IP的代理选择与配置

如何找到可靠的免费代理服务器

动态与静态代理IP的区别解析

静态代理IP怎么填写：步骤与示例

在线代理服务器的使用与推荐

国外代理服务器的优势及选择建议

HTTP代理服务器的设置及应用实例

Socks5代理配置教程及注意事项

什么是Socks5代理IP及其优势

什么是代理服务器IP：如何选择合适的

海外静态IP的代理选择与配置