案例1:京东商品页面的爬取
商品链接
import requests url = "https://item.jd.com/2967929.html" try: r = requests.get(url) r.raise_for_status() r.encoding = r.apparent_encoding print(r.text[:1000]) except: print("爬取失败")
案例2:亚马逊商品页面的爬取
商品链接
import requests url = "https://www.amazon.cn/gp/product/B01M8L5Z3Y" try: kv = {'user-agent':'Mozilla/5.0'} r = requests.get(url, headers = kv) r.raise_for_status() r.encoding = r.apparent_encoding print(r.text[1000:2000]) except: print("爬取失败")
案例3:百度360关键词提交
搜索引擎关键词提交接口
百度的关键词接口:http://www.baidu.com/s?wd=keyword
360的关键词接口:http://www.so/com/s?q=keyword
import requests keyword = "Python" try: kv = {'wd':keyword} r = requests.get("http://www.baidu.com/s",params=kv) print(r.request.url) r.raise_for_status() print(len(r.text)) except: print("爬取失败")
import requests keyword = "Python" try: kv = {'q':keyword} r = requests.get("http://www.so.com/s",params=kv) print(r.request.url) r.raise_for_status() print(len(r.text)) except: print("爬取失败")
案例4:网络图片的爬取和存储
网络图片链接的格式:http://www.example.com/picture.jpg
国家地理
选择一个图片Web页面:
http://www.nationalgeographic.com.cn/photography/photo_of_the_day/3921.html
该图片地址:http://image.nationalgeographic.com.cn/2017/0211/20170211061910157.jpg
import requests import os url = "http://image.nationalgeographic.com.cn/2017/0211/20170211061910157.jpg" root = "D://pics//" path = root + url.split('/')[-1] try: if not os.path.exists(root): os.mkdir(root) if not os.path.exists(path): r = requests.get(url) with open(path, 'wb') as f: f.write(r.content) f.close() print("文件保存成功") else: print("文件已存在") except: print("爬取失败")
案例5:IP地址归属地的自动查询
http://m.ip138.com/ip.asp?ip=ipaddress
import requests url = "http://m.ip138.com/ip.asp?ip=" try: r = requests.get(url + '202.204.80.112') r.raise_for_status() r.encoding = r.apparent_encoding print(r.text[-500:]) except: print("爬取失败")
在学习中有迷茫不知如何学习的朋友小编推荐一个学Python的学习q u n 227 -435- 450可以来了解一起进步一起学习!免费分享视频资料
神龙|纯净稳定代理IP免费测试>>>>>>>>天启|企业级代理IP免费测试>>>>>>>>IPIPGO|全球住宅代理IP免费测试