celebs人脸数据的爬取

587次阅读

人脸相关项目，人脸数据是基本，也是比较麻烦的，最近在进行人脸数据采集的时候，发现了一个很实用的外文网站。

celebs人脸数据的爬取于是针对该网站进行分析。爬取相应的人脸数据。

1、首先，获取所有的页面列表

通过对网站内容的分析，发现该网站已经实现了按名字首字母的分页。形式如下

https://celebs-place.com/photos/people-A.html https://celebs-place.com/photos/people-B.html https://celebs-place.com/photos/people-C.html …

展示如下：

celebs人脸数据的爬取

2、解析名称列表

这么多人物，不可能一个个输入。要考虑自动解析。

通过对https://celebs-place.com/photos/people-A.html网页内容分析，发现人物名称保存在如下页面内容中

celebs人脸数据的爬取

于是，便获取相应页面内容，并解析

url = ‘https://celebs-place.com/photos/people-‘ + class_number + ‘.html’ response = requests.get(url, headers=headers) html_data = etree.HTML(response.text) celebs_url_list = html_data.xpath(‘//div[@class=”model_card”]/a/@href’) name_list = html_data.xpath(‘//div[@class=”model_card”]/a/div/span/text()’)

解析获得每个人物的名称以及人物对应子页面路径。

3、解析页码

由于不同人的图片数量不一致，所以保存的页面数量也有差异，所以在针对每个人物图片进行解析之前，需要对每个人物所占有的页码进行解析

从个人页面内容分析可以得知，页码是存储在页面如下位置

celebs人脸数据的爬取

针对这些html文件进行解析，得到每个人物所占有的页码数量。

person_url = ‘https://celebs-place.com’ + person_sub_url person_res = requests.get(person_url, headers=headers).text person_data = etree.HTML(person_res) page_number_info = person_data.xpath(“//div[@class=’pagination my-4′][1]/li[last()]/a/text()”)

4、人物图片爬取

同样分析个人页面，找到图片的存储位置

celebs人脸数据的爬取

然后针对html进行解析。并且下载相应的图片进行保存

for page in range(1, page_number): url_detail = ‘https://celebs-place.com’ + person_sub_url + ‘page{}/’

print(url_detail.format(page))

res = requests.get(url_detail.format(page), headers=headers).text #print(res) data = etree.HTML(res) img_url_list = data.xpath(“//div[@class=’gallery-pics-list’][1]/div/a/img/@src”)

if not img_url_list: break

for image_info in img_url_list: image_complete_url = “https://celebs-place.com/” + image_info image_get_result = requests.get(image_complete_url, headers=headers).content f = open(path + “/” + person_name + “-” + str(num) + “.jpg”, “wb”) f.write(image_get_result) count += 1

最后保存如下所示：

celebs人脸数据的爬取

神龙|纯净稳定代理IP免费测试>>>>>>>>天启|企业级代理IP免费测试>>>>>>>>IPIPGO|全球住宅代理IP免费测试

发表于：Python爬虫

2022-10-25

# Python爬虫

复制链接

赏

celebs人脸数据的爬取

相关文章：

HTTP代理设置详解：一步步配置指南

什么是Socks5代理IP及其优势

Socks5代理配置教程及注意事项

什么是代理服务器IP：如何选择合适的

国外代理服务器的优势及选择建议

如何找到可靠的免费代理服务器

在线代理服务器的使用与推荐

HTTP代理服务器的设置及应用实例

静态代理IP怎么填写：步骤与示例

海外静态IP的代理选择与配置