爬取正方教务系统课程表

2,226次阅读

被一个学长布置下的任务…有些地方可能不够完整…
思路：首先你需要完成登录操作：
（1）首先根据教务系统网站的审查元素，发现了一个验证码的网址：http://210.40.2.253:8888/(fw5xjvfovnf3f4zg1ikero2a)/CheckCode.aspx
进去后你会发现，这里面的验证码是会随着时间的变化而变化的，那这样子该怎么办呢？（果断百度一波！！）
然后巴拉巴拉… 大概就是说验证码的核对是与你的cookies有关的，那么这个东西要怎么获得呢，
request第三方库中提供了一个session ，这个就是相当于代码中的一个浏览器，在你请求http的时候会自动帮你保存你的cookies的值，这样子就可以完成验证码的匹配操作了。
具体一点：
先创建一个session对象
然后向验证码页面发送get请求（这时你的cookies就有了）
然后向教务系统页面发送post请求（提交账号密码验证码）
这时，你就登录成功了（前提是提交的东西要正确= =）
（2）爬取课程表
登录进去后，我们继续我们的审查元素，在其中找到了向课程表页面发送post请求的地址：
http://210.40.2.253:8888/xskbcx.aspx?xh=1717000113&xm=֣��&gnmkdm=N121603
其中 xh 是学号， xm 是学生姓名的url编码， gnmkdm是课程表页面的代码
这些信息是获得课表页面的学生信息，会显示出你是哪里的学生，学号是什么，专业是什么等等之类的
这些东西我们需要想办法获得，也就是说，只有得到了这些信息，我们才能方便向课表页面发送post请求
那么应该怎么获得？
（1）先发送一个get请求 http://210.40.2.253:8888/xskbcx.aspx?xh="+ user_name+ "&gnmkdm=N121603
（2）然后运用bs4库去提取http网页中的学生信息（参考代码）
（3）得到后向课表页面发送post请求（哪一年第几学期）
（4）然后便得到了我们的课程表页面，接下来就可以进行信息提取了

其实只要登录进去了，里面的成绩，课表，选课什么的都是可以以一种模拟人的办法进行操作

import requests from bs4 import BeautifulSoup session = None checkcodePath = './code.png' #验证码保存路径 res = requests.Session() Origin_url = "http://210.40.2.253:8888/(fw5xjvfovnf3f4zg1ikero2a)/" # 教务系统网址 url = "http://210.40.2.253:8888/default2.aspx" checkcodeURL = Origin_url+'CheckCode.aspx' #验证码网址 head = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:63.0) Gecko/20100101 Firefox/63.0","Connection": "keep-alive", "Referer":"http://210.40.2.253:8888/default2.aspx"} post_data = {"__VIEWSTATE":"dDwtNTE2MjI4MTQ7Oz4I55DQ6KPcVdzTLmjGjlJPRWgYUQ==","Textbox1" : "" ,"txtSecretCode":"","RadioButtonList1":"%D1%A7%C9%FA","Button1":"","lbLanguage":"","hidPdrs":"","hidsc":""} def get_photo(URL): checkcode = res.get(checkcodeURL, headers=head) with open(checkcodePath, 'wb') as fp: #保存验证码图片 fp.write(checkcode.content) post_data['txtSecretCode'] = input("请输入图片中的验证码: （验证码图片保存在同级文件夹）")

def login(): user_name = input("请输入您的学号：") user_password = input("请输入您的密码：") post_data["txtUserName"] = user_name post_data["TextBox2"]= user_password login_page_url = Origin_url + "default2.aspx" head['Referer'] = login_page_url get_photo(url) homePage = res.post(login_page_url, data=post_data, headers=head) #在这里得到了主页面 with open('text1.html','w' , encoding = 'gb2312') as f : f.write(homePage.text) #更改数据 #head["Referer"]="http://210.40.2.253:8888/(fw5xjvfovnf3f4zg1ikero2a)/default2.aspx" exit_sys = "what should i do " URL = "http://210.40.2.253:8888/xskbcx.aspx?xh="+ user_name+ "&gnmkdm=N121603" #URL = "http://210.40.2.253:8888/xskbcx.aspx?xh=1717000113&xm=%D6%A3%BC%CE%F3%DE&gnmkdm=N121603 head["Referer"]= URL

page_home = res.get(URL , headers = head) post_data["__VIEWSTATE"] = BeautifulSoup(page_home.text, 'html.parser').find_all('input')[2].get('value') #xm = BeautifulSoup(res.get(URL , headers = head).text, 'html.parser').find_all('form')[0].get('action') post_data["__EVENTTARGET"] = "xqd" while exit_sys != "q": query_years = input("请输入您要查询课表的年份：（2017-2018）") query_how = input("请输入您要查询第几学期的课表：")

#URL = "http://210.40.2.253:8888/" + xm #print(URL) post_data["xnd"]= query_years post_data["xqd"]=query_how

learning = res.post(URL , data = post_data , headers = head) print(URL) if query_years == "2018-2019" and query_how == "1" : with open( query_years + '-'+ query_how + '.html' ,'w' , encoding = "gb2312") as f : f.write(page_home.text) else : with open( query_years + '-'+ query_how + '.html' ,'w' , encoding = "gb2312" , errors = "ignore") as f : f.write( learning.text) print("您的课表已经保存为html的形式！") print("按q退出按c继续") while 1 : exit_sys = input() if exit_sys == "c" or exit_sys == 'q' : break else: print("无效操作！") return learning def class_table(table_html): pass

def main(): res = login() class_table(res) if __name__ == "__main__": main()

神龙|纯净稳定代理IP免费测试>>>>>>>>天启|企业级代理IP免费测试>>>>>>>>IPIPGO|全球住宅代理IP免费测试

发表于：Python爬虫

2022-10-28

# Python爬虫

复制链接

赏

爬取正方教务系统课程表

相关文章：

HTTP代理设置详解：一步步配置指南

什么是Socks5代理IP及其优势

Socks5代理配置教程及注意事项

什么是代理服务器IP：如何选择合适的

国外代理服务器的优势及选择建议

如何找到可靠的免费代理服务器

在线代理服务器的使用与推荐

HTTP代理服务器的设置及应用实例

静态代理IP怎么填写：步骤与示例

海外静态IP的代理选择与配置