Python大作业——爬虫+可视化+数据分析+数据库(数据分析篇)

883次阅读
没有评论

个人博客

Python大作业——爬虫+可视化+数据分析+数据库(简介篇)

Python大作业——爬虫+可视化+数据分析+数据库(爬虫篇)

Python大作业——爬虫+可视化+数据分析+数据库(可视化篇)

Python大作业——爬虫+可视化+数据分析+数据库(数据库篇)

一、生成歌词词云

首先我们需要先获取所有爬取到的歌曲的歌词,将他们合成字符串

随后提取其中的中文,再合成字符串

text <span class="token operator">=</span> re<span class="token punctuation">.</span>findall<span class="token punctuation">(</span><span class="token string">'[\u4e00-\u9fa5]+'</span><span class="token punctuation">,</span> lyric<span class="token punctuation">,</span> re<span class="token punctuation">.</span>S<span class="token punctuation">)</span>  <span class="token comment"># 提取中文</span>
text <span class="token operator">=</span> <span class="token string">" "</span><span class="token punctuation">.</span>join<span class="token punctuation">(</span>text<span class="token punctuation">)</span>

之后使用jieba进行分词,并将其中分出来的长度大于等于2的词保存

word <span class="token operator">=</span> jieba<span class="token punctuation">.</span>cut<span class="token punctuation">(</span>text<span class="token punctuation">,</span> cut_all<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span>  <span class="token comment"># 分词</span>
new_word <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
<span class="token keyword">for</span> i <span class="token keyword">in</span> word<span class="token punctuation">:</span>
    <span class="token keyword">if</span> <span class="token builtin">len</span><span class="token punctuation">(</span>i<span class="token punctuation">)</span> <span class="token operator">>=</span> <span class="token number">2</span><span class="token punctuation">:</span>
        new_word<span class="token punctuation">.</span>append<span class="token punctuation">(</span>i<span class="token punctuation">)</span>  <span class="token comment"># 只添加长度大于2的词</span>
final_text <span class="token operator">=</span> <span class="token string">" "</span><span class="token punctuation">.</span>join<span class="token punctuation">(</span>new_word<span class="token punctuation">)</span>

接下来为生成的词云选择一张好看的图片,就可以开始生成了!
Python大作业——爬虫+可视化+数据分析+数据库(数据分析篇)

mask <span class="token operator">=</span> np<span class="token punctuation">.</span>array<span class="token punctuation">(</span>Image<span class="token punctuation">.</span><span class="token builtin">open</span><span class="token punctuation">(</span><span class="token string">"2.jpg"</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
word_cloud <span class="token operator">=</span> WordCloud<span class="token punctuation">(</span>background_color<span class="token operator">=</span><span class="token string">"white"</span><span class="token punctuation">,</span> width<span class="token operator">=</span><span class="token number">800</span><span class="token punctuation">,</span> height<span class="token operator">=</span><span class="token number">600</span><span class="token punctuation">,</span> max_words<span class="token operator">=</span><span class="token number">100</span><span class="token punctuation">,</span> max_font_size<span class="token operator">=</span><span class="token number">80</span><span class="token punctuation">,</span> contour_width<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">,</span> contour_color<span class="token operator">=</span><span class="token string">'lightblue'</span><span class="token punctuation">,</span> font_path<span class="token operator">=</span><span class="token string">"C:/Windows/Fonts/simfang.ttf"</span><span class="token punctuation">,</span> mask<span class="token operator">=</span>mask<span class="token punctuation">)</span><span class="token punctuation">.</span>generate<span class="token punctuation">(</span>final_text<span class="token punctuation">)</span>
<span class="token comment"># plt.imshow(word_cloud, interpolation="bilinear")</span>
<span class="token comment"># plt.axis("off")</span>
<span class="token comment"># plt.show()</span>
word_cloud<span class="token punctuation">.</span>to_file<span class="token punctuation">(</span>self<span class="token punctuation">.</span>keyword<span class="token operator">+</span><span class="token string">'词云.png'</span><span class="token punctuation">)</span>
os<span class="token punctuation">.</span>startfile<span class="token punctuation">(</span>self<span class="token punctuation">.</span>keyword<span class="token operator">+</span><span class="token string">'词云.png'</span><span class="token punctuation">)</span>

WordCloud参数中的contour_width=1, contour_color='lightblue’分别为背景图片轮廓线条的粗细和颜色,如果没有设置则不会出现轮廓,font_path是用来指定字体的

生成后可以通过show展示也可以通过保存到本地并打开,最终结果如下

Python大作业——爬虫+可视化+数据分析+数据库(数据分析篇)

二、热门歌手歌曲量饼图

Python大作业——爬虫+可视化+数据分析+数据库(数据分析篇)

首先是获得热门歌手列表以及热门歌手歌曲量

随后用每个歌手歌曲数量除以所有这十个歌手的总歌曲数量,得到每个歌手歌曲量的占比

接下来可以选择设置哪一块突出显示,如图中周杰伦部分突出显示

如下只需要将突出部分的值设置大即可

explode <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token number">0.1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span>

接下来就可以生成饼图了

plt<span class="token punctuation">.</span>figure<span class="token punctuation">(</span>figsize<span class="token operator">=</span><span class="token punctuation">(</span><span class="token number">6</span><span class="token punctuation">,</span> <span class="token number">9</span><span class="token punctuation">)</span><span class="token punctuation">)</span>  <span class="token comment"># 设置图形大小宽高</span>
plt<span class="token punctuation">.</span>rcParams<span class="token punctuation">[</span><span class="token string">'font.sans-serif'</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token string">'SimHei'</span><span class="token punctuation">]</span>  <span class="token comment"># 解决中文乱码问题</span>
plt<span class="token punctuation">.</span>axes<span class="token punctuation">(</span>aspect<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">)</span>  <span class="token comment"># 设置图形是圆的</span>
plt<span class="token punctuation">.</span>pie<span class="token punctuation">(</span>x<span class="token operator">=</span>proportion<span class="token punctuation">,</span> labels<span class="token operator">=</span>name<span class="token punctuation">,</span> explode<span class="token operator">=</span>explode<span class="token punctuation">,</span> autopct<span class="token operator">=</span><span class="token string">'%3.1f %%'</span><span class="token punctuation">,</span>
shadow<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">,</span> labeldistance<span class="token operator">=</span><span class="token number">1.2</span><span class="token punctuation">,</span> startangle<span class="token operator">=</span><span class="token number">0</span><span class="token punctuation">,</span> pctdistance<span class="token operator">=</span><span class="token number">0.8</span><span class="token punctuation">)</span>
plt<span class="token punctuation">.</span>title<span class="token punctuation">(</span><span class="token string">"热门歌手歌曲量占比"</span><span class="token punctuation">)</span>
<span class="token comment"># plt.show()</span>
plt<span class="token punctuation">.</span>savefig<span class="token punctuation">(</span><span class="token string">"热门歌手歌曲量占比饼图.jpg"</span><span class="token punctuation">)</span>
os<span class="token punctuation">.</span>startfile<span class="token punctuation">(</span><span class="token string">"热门歌手歌曲量占比饼图.jpg"</span><span class="token punctuation">)</span>

其中x是歌曲量占比的列表,labels是对应的标签(在此图中则为歌手的姓名),explode就是上文提到的突出显示,这三个列表中的各个值是一一对应的,autopct是设置占比数值的显示方式,3.1f则表示占宽为3位(如果大于会原样输出),精度为1的浮点数

同样可以选择直接show展示,或者保存到本地再打开

三、歌曲热度占比条形图

在之前我们通过爬虫获取了top500的歌曲的信息(如下),现在我们希望对歌曲的热度进行分析,生成柱状图

Python大作业——爬虫+可视化+数据分析+数据库(数据分析篇)

效果图如下:

Python大作业——爬虫+可视化+数据分析+数据库(数据分析篇)

本来是想生成歌手拥有热门歌曲数量的柱形图的,但是那个爬取热门歌曲的网站中那些热门歌曲没有对应的歌手,还需要自己再去其他网站获得每首歌曲对应的歌手,太麻烦了就没这么做了,有兴趣的小伙伴可以自己实现一下

首先我们要获得每个热度范围的歌曲数量

下面的data列表就是对应x元组范围的歌曲数量

我们只要通过遍历歌曲热度列表,每次都在其data列表对应热度+1,最终即可得到每个热度范围的歌曲数量

x <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token string">'0-10'</span><span class="token punctuation">,</span> <span class="token string">'10-20'</span><span class="token punctuation">,</span> <span class="token string">'20-30'</span><span class="token punctuation">,</span> <span class="token string">'30-40'</span><span class="token punctuation">,</span> <span class="token string">'40-50'</span><span class="token punctuation">,</span> <span class="token string">'>50'</span><span class="token punctuation">)</span>
data <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span>

接下来就是创建柱状图,首先解决中文乱码问题

plt<span class="token punctuation">.</span>rcParams<span class="token punctuation">[</span><span class="token string">'font.sans-serif'</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token string">'SimHei'</span><span class="token punctuation">]</span>
plt<span class="token punctuation">.</span>rcParams<span class="token punctuation">[</span><span class="token string">'axes.unicode_minus'</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token boolean">False</span>

随后即可通过plt.bar创建,其中第一个参数为横坐标数据,第二个参数为纵坐标数据,第三个参数为为柱状图填充颜色,第四个参数为透明度

title,xlabel,ylabel显然就是该柱状图的标题,横坐标和纵坐标的名称

plt<span class="token punctuation">.</span>bar<span class="token punctuation">(</span>x<span class="token punctuation">,</span> data<span class="token punctuation">,</span> color<span class="token operator">=</span><span class="token string">'steelblue'</span><span class="token punctuation">,</span> alpha<span class="token operator">=</span><span class="token number">0.8</span><span class="token punctuation">)</span>
plt<span class="token punctuation">.</span>title<span class="token punctuation">(</span><span class="token string">"pop500歌曲热度"</span><span class="token punctuation">)</span>
plt<span class="token punctuation">.</span>xlabel<span class="token punctuation">(</span><span class="token string">"歌曲热度范围"</span><span class="token punctuation">)</span>
plt<span class="token punctuation">.</span>ylabel<span class="token punctuation">(</span><span class="token string">"歌曲数量"</span><span class="token punctuation">)</span>
plt<span class="token punctuation">.</span>show<span class="token punctuation">(</span><span class="token punctuation">)</span>

神龙|纯净稳定代理IP免费测试>>>>>>>>天启|企业级代理IP免费测试>>>>>>>>IPIPGO|全球住宅代理IP免费测试

相关文章:

版权声明:Python教程2022-10-20发表,共计2437字。
新手QQ群:570568346,欢迎进群讨论 Python51学习