Python是一种非常适合网络爬虫的编程语言,因其丰富的第三方库和简单的语法,使得爬虫的实现变得快捷而高效。下面分享7个简单的Python爬虫小案例,包括具体的代码示例,希望能为你提供帮助。

案例1:爬取豆瓣电影 TOP250

import requests
from bs4 import BeautifulSoup

url = 'https://movie.douban.com/top250'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

movies = soup.find_all('div', class_='info')
for movie in movies:
    title = movie.find('span', class_='title').text
    rating = movie.find('span', class_='rating_num').text
    print(f'电影名称: {title}, 评分: {rating}')

案例2:抓取天气信息

import requests

city = 'Shanghai'
url = f'http://wttr.in/{city}?format=%C+%t'
response = requests.get(url)
print(f'{city}的天气情况:{response.text}')

案例3:爬取知乎问题的回答

import requests
from bs4 import BeautifulSoup

question_id = '123456'  # 假设这是一个知乎问题的ID
url = f'https://www.zhihu.com/question/{question_id}'
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

answers = soup.find_all('div', class_='Answer')
for answer in answers:
    content = answer.find('div', class_='RichContent')
    print(content.text)

案例4:下载图片

import requests

url = 'https://www.example.com/path/to/image.jpg'
response = requests.get(url)

with open('image.jpg', 'wb') as f:
    f.write(response.content)
print('图片下载完成!')

案例5:抓取小说章节

import requests
from bs4 import BeautifulSoup

url = 'http://www.example.com/novel/chapter1'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

chapter_title = soup.find('h1').text
content = soup.find('div', class_='content').text
print(f'章节标题: {chapter_title}\n内容: {content}')

案例6:爬取知乎用户信息

import requests

username = 'zhihuzhiyang'  # 假设这是一个知乎用户名
url = f'https://www.zhihu.com/people/{username}'
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)

print(response.text)  # 输出用户的主页HTML

案例7:简单的爬虫进阶:使用 Scrapy

# 创建 Scrapy 项目并在 spiders 文件夹中创建 spider.py
import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        'http://quotes.toscrape.com/'
    ]

    def parse(self, response):
        for quote in response.css('div.quote'):
            yield {
                'text': quote.css('span.text::text').get(),
                'author': quote.css('small.author::text').get(),
            }

# 运行命令: scrapy crawl quotes -o quotes.json

以上就是7个简单的 Python 爬虫小案例。通过这些实例,你可以看到使用 Python 进行网页数据抓取是相对直接的。你可以根据自己的需求修改代码,爬取其他网页的数据。在使用爬虫时,请务必遵循目标网站的爬虫协议 (robots.txt) 和相关法律法规,合理利用数据。

点赞(0) 打赏

微信小程序

微信扫一扫体验

微信公众账号

微信扫一扫加关注

发表
评论
返回
顶部