我们都知道想要看到账户的信息,需要先用网页进行登录,然后才能在主页中查找我们想要的信息。学习了这么久的python爬取操作后,有没有什么可以不用登录就可以获取信息的方法呢?结合我们这几天学习的cookie完全可以做到这一点,不会的小伙伴们也跟着我们一起,看看cookie在python爬虫中获取主页信息的方法吧。
我们在代码中直接获取我的个人信息
import requests headers = { # 假装自己是浏览器 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/73.0.3683.75 Chrome/73.0.3683.75 Safari/537.36', # 把你刚刚拿到的Cookie塞进来 'Cookie': 'eda38d470a662ef3606390ac3b84b86f9; Hm_lvt_f1d3b035c559e31c390733e79e080736=1553503899; biihu__user_login=omvZVatKKSlcXbJGmXXew9BmqediJ4lzNoYGzLQjTR%2Fjw1wOz3o4lIacanmcNncX1PsRne5tXpE9r1sqrkdhAYQrugGVfaBICYp8BAQ7yBKnMpAwicq7pZgQ2pg38ZzFyEZVUvOvFHYj3cChZFEWqQ%3D%3D; Hm_lpvt_f1d3b035c559e31c390733e79e080736=1553505597', } session = requests.Session() response = session.get('https://biihu.cc/people/wistbean%E7%9C%9F%E7%89%B9%E4%B9%88%E5%B8%85', headers=headers) print(response.text)
运行后可以发现不用登录就可以直接拿到自己的个人信息了
<!DOCTYPE html> <html> <head> <meta content="text/html;charset=utf-8" http-equiv="Content-Type" /> <meta content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" name="viewport" /> <meta http-equiv="X-UA-Compatible" content="IE=edge,Chrome=1" /> <meta name="renderer" content="webkit" /> <title>小帅b真特么帅 的个人主页 - 逼乎</title> <meta name="keywords" content="逼乎,问答,装逼,逼乎网站" /> <meta name="description" content="逼乎 ,与世界分享你的装逼技巧与见解" /> <base href="https://biihu.cc/" /><!--[if IE]></base><![endif]--> <link rel="stylesheet" type="text/css" href="https://biihu.cc/static/css/bootstrap.css" /> <link rel="stylesheet" type="text/css" href="https://biihu.cc/static/css/icon.css" /> <link href="https://biihu.cc/static/css/default/common.css?v=20180831" rel="stylesheet" type="text/css" /> <link href="https://biihu.cc/static/css/default/link.css?v=20180831" rel="stylesheet" type="text/css" /> <link href="https://biihu.cc/static/js/plug_module/style.css?v=20180831" rel="stylesheet" type="text/css" /> <link href="https://biihu.cc/static/css/default/user.css?v=20180831" rel="stylesheet" type="text/css" /> <link href="https://biihu.cc/static/css/mood/mood.css" rel="stylesheet" type="text/css" /> <script type="text/javascript"> var _02AEC94D5CA08B39FC0E1F7CC220F9B4="a5359326797de302bfc9aa6302c001b8"; var G_POST_HASH=_02AEC94D5CA08B39FC0E1F7CC220F9B4; var G_INDEX_SCRIPT = ""; var G_SITE_NAME = "逼乎"; var G_BASE_URL = "https://biihu.cc"; var G_STATIC_URL = "https://biihu.cc/static"; var G_UPLOAD_URL = "/uploads"; var G_USER_ID = "188"; var G_USER_NAME = "小帅b真特么帅"; var G_UPLOAD_ENABLE = "Y"; var G_UNREAD_NOTIFICATION = 0; var G_NOTIFICATION_INTERVAL = 100000; var G_CAN_CREATE_TOPIC = "1"; var G_ADVANCED_EDITOR_ENABLE = "Y"; var FILE_TYPES = "jpg,jpeg,png,gif,zip,doc,docx,rar,pdf,psd"; </script> <script src="https://biihu.cc/static/js/jquery.2.js?v=20180831" type="text/javascript"></script> ....
通过运行结果我们可以看到一些主页上的信息了,关键是我们并没有通过账号登录才看到,这也从侧面说明的cookie的强大是不是。
神龙|纯净稳定代理IP免费测试>>>>>>>>天启|企业级代理IP免费测试>>>>>>>>IPIPGO|全球住宅代理IP免费测试