分享Python7个爬虫小案例（附源码）

73 阅读 0 评论 0 点赞

在Python中，爬虫是一种应用广泛的技术，它可以帮助我们从网页上提取所需的数据。以下是7个简单的爬虫案例，供你参考和学习。

案例1：基本的网页爬虫

这个爬虫的功能是从一个网页抓取HTML内容。

import requests

url = 'http://example.com'
response = requests.get(url)
print(response.text)  # 输出网页的HTML内容

案例2：提取网页标题

我们可以使用BeautifulSoup库来解析HTML，并提取网页标题。

from bs4 import BeautifulSoup
import requests

url = 'http://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
title = soup.title.string
print('网页标题:', title)

案例3：爬取图片

这个示例将爬取网站上的图片并保存到本地。

import os
import requests
from bs4 import BeautifulSoup

url = 'http://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
images = soup.find_all('img')

if not os.path.exists('images'):
    os.makedirs('images')

for img in images:
    img_url = img['src']
    img_data = requests.get(img_url).content
    img_name = os.path.join('images', img_url.split('/')[-1])
    with open(img_name, 'wb') as handler:
        handler.write(img_data)

案例4：网页数据表格提取

从一个HTML表格中提取数据并打印。

import requests
from bs4 import BeautifulSoup

url = 'http://example.com/table'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table')

for row in table.find_all('tr'):
    cols = row.find_all('td')
    data = [col.text.strip() for col in cols]
    print(data)

案例5：使用代理爬虫

在爬取一些网站时可能会被封禁，可以使用代理。

import requests

url = 'http://example.com'
proxies = {
    'http': 'http://10.10.1.10:3128',
    'https': 'http://10.10.1.10:1080',
}
response = requests.get(url, proxies=proxies)
print(response.text)

案例6：使用登录爬虫

一些网站需要登录才能访问数据，我们需要模拟登录。

import requests

login_url = 'http://example.com/login'
data = {
    'username': 'your_username',
    'password': 'your_password'
}
session = requests.Session()
session.post(login_url, data=data)

protected_url = 'http://example.com/protected'
response = session.get(protected_url)
print(response.text)

案例7：抓取动态数据

使用Selenium库抓取动态加载的网页数据。

from selenium import webdriver

driver = webdriver.Chrome()  # 需要确保ChromeDriver已安装
driver.get('http://example.com/dynamic')

# 等待页面加载
driver.implicitly_wait(5)  # 等待5秒

data = driver.find_element_by_id('dynamic_data').text
print(data)
driver.quit()

以上是7个简单的Python爬虫案例。这些示例展示了各种爬虫的基本用法，包括网页抓取、数据解析、图片下载、使用代理服务器、模拟登录及抓取动态内容等。要注意的是，在进行网络爬虫时，请遵循相关网站的robots.txt规则和使用条款，避免给网站带来负担。同时，可以根据具体需求进行相应的扩展和修改。希望这些示例对你学习Python爬虫有帮助！

点赞(0) 打赏

本文分类：后端
本文标签：爬虫数据分析 python
浏览次数：73 次浏览
发布日期：2024-09-30 04:18:26
本文链接：http://makehui.com/houduan/2574.html

上一篇 > 软件测试面试必杀篇：【2023软件测试面试八股文宝典】
下一篇 > 13种权重的计算方法

分享Python7个爬虫小案例（附源码）

案例1：基本的网页爬虫

案例2：提取网页标题

案例3：爬取图片

案例4：网页数据表格提取

案例5：使用代理爬虫

案例6：使用登录爬虫

案例7：抓取动态数据

C++和Python混合编程——Python调用C++入门

【Python】爬取网易新闻今日热点列表数据并导出

探索 Python 中的 uuid 模块：生成唯一标识符

最佳 Python 编译器