第一个爬虫 · 无码不欢

#引用爬虫模块
import urllib.request
#向指定网址发送get请求，返回response对象
response = urllib.request.urlopen('http://www.baidu.com/')
# 读取对象内容
html = response.read()
print(html)

上面就是一个简单的爬虫。

用抓包工具可以看到User-Agent: Python-urllib/3.8。

我们需要模拟浏览器请求，下面对它进行简单包装。

import urllib.request
#设置User-Agent
ug = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"}
# 设置请求对象
request = urllib.request.Request('http://www.baidu.com/',headers = ug)

#向指定网址发送get请求，返回response对象
response = urllib.request.urlopen(request)
# 读取对象内容
html = response.read()
print(html)

这样User-Agent就变成浏览器了