Python爬虫入门实例：Python7个爬虫小案例（附源码）

99 阅读 0 评论 0 点赞

Python爬虫是一个非常有趣且实用的技术，它能够帮助我们从互联网获取大量数据。本文将介绍7个简单的Python爬虫实例，每个实例中我们都会提供一些基本的代码示例，帮助读者入门。

实例1：爬取网页标题

我们先从一个简单的爬虫开始，获取某个网页的标题。

import requests
from bs4 import BeautifulSoup

url = 'http://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

title = soup.title.string
print('网页标题:', title)

实例2：爬取网页中的所有链接

接下来，我们将获取某个网页中所有的链接。

url = 'http://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

links = [a['href'] for a in soup.find_all('a', href=True)]
print('网页中的所有链接:', links)

实例3：爬取图片

我们可以编写一个爬虫来下载网页中的所有图片。

import os

url = 'http://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

images = [img['src'] for img in soup.find_all('img', src=True)]
for img_url in images:
    img_data = requests.get(img_url).content
    img_name = os.path.join('images', img_url.split('/')[-1])
    with open(img_name, 'wb') as img_file:
        img_file.write(img_data)
print('图片下载完成!')

实例4：爬取天气信息

我们可以从天气网站上获取一些天气信息。

url = 'https://tianqi.moji.com/weather/china/beijing'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

weather = soup.find('div', class_='wea').text
temperature = soup.find('span', class_='tem').text

print('当前天气:', weather)
print('当前气温:', temperature)

实例5：爬取新闻标题

以下示例将爬取一个新闻网站的标题。

url = 'https://news.ycombinator.com/'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

titles = [a.text for a in soup.find_all('a', class_='storylink')]
print('新闻标题:')
for title in titles:
    print('-', title)

实例6：模拟登录

我们可以使用requests库模拟登录到某个网站。

login_url = 'http://example.com/login'
data = {'username': 'your_username', 'password': 'your_password'}
session = requests.Session()

response = session.post(login_url, data=data)
print('登录成功!' if '欢迎' in response.text else '登录失败!')

实例7：爬取JSON数据

以下实例将从API获取数据。

url = 'https://api.example.com/data'
response = requests.get(url)
data = response.json()

print('获取到的数据:')
for item in data:
    print(item)

总结

上述示例展示了如何使用Python爬虫进行简单的数据抓取任务。为了运行这些示例，你需要安装requests与BeautifulSoup库：

pip install requests beautifulsoup4

在实际应用中，建议遵循网站的robots.txt规则，合理控制爬虫的频率和请求数量，以免对网站造成负担。同时，注意处理异常和错误，确保你的爬虫程序稳定运行。

点赞(0) 打赏

本文分类：后端
本文标签：爬虫开发语言 python Python
浏览次数：99 次浏览
发布日期：2024-09-21 04:49:09
本文链接：http://makehui.com/houduan/153.html

上一篇 > ECMAScript与JavaScript的区别：深入解析与代码示例
下一篇 > 【Python篇】matplotlib超详细教程-由入门到精通（下篇）

Python爬虫入门实例：Python7个爬虫小案例（附源码）

实例1：爬取网页标题

实例2：爬取网页中的所有链接

实例3：爬取图片

实例4：爬取天气信息

实例5：爬取新闻标题

实例6：模拟登录

实例7：爬取JSON数据

总结

Python 可视化 web 神器：streamlit、Gradio、dash、nicegui；低代码 Python Web 框架：PyWebIO

爬虫学习 | 01 Web Scraper的使用

Python的fengwo模块筹码峰函数（COST，WINNER）及调用通达信DLL示例

VMamba 安装教程（无需更改base环境中的cuda版本）