Java 读取 Excel、docx、pdf 和 txt 等文件万能方法！

18 阅读 0 评论 0 点赞

在Java中，读取各种文件格式如Excel（.xlsx）、Word（.docx）、PDF（.pdf）和文本文件（.txt）常常是开发者需要实现的功能。本文将介绍如何使用Java中的一些流行库来读取这些文件的内容，提供一个“万能方法”。

1. 读取 Excel 文件

对于Excel文件，我们通常使用Apache POI库。Apache POI支持Excel 2003（.xls）和Excel 2007及更新版本（.xlsx）两种格式。

示例代码:

import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;

public class ExcelReader {
    public static void readExcel(String filePath) throws IOException {
        FileInputStream fis = new FileInputStream(new File(filePath));
        Workbook workbook = new XSSFWorkbook(fis);
        Sheet sheet = workbook.getSheetAt(0); // 获取第一个表单

        for (Row row : sheet) {
            for (Cell cell : row) {
                switch (cell.getCellType()) {
                    case STRING:
                        System.out.print(cell.getStringCellValue() + "\t");
                        break;
                    case NUMERIC:
                        System.out.print(cell.getNumericCellValue() + "\t");
                        break;
                    case BOOLEAN:
                        System.out.print(cell.getBooleanCellValue() + "\t");
                        break;
                    default:
                        System.out.print("Unsupported cell type\t");
                }
            }
            System.out.println();
        }
        workbook.close();
        fis.close();
    }
}

2. 读取 Word 文件

要读取Word文档，可以使用Apache POI的HWPF和XWPF库。

示例代码:

import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;

import java.io.FileInputStream;
import java.io.IOException;

public class WordReader {
    public static void readWord(String filePath) throws IOException {
        FileInputStream fis = new FileInputStream(filePath);
        XWPFDocument document = new XWPFDocument(fis);

        for (XWPFParagraph paragraph : document.getParagraphs()) {
            System.out.println(paragraph.getText());
        }
        document.close();
        fis.close();
    }
}

3. 读取 PDF 文件

PDF文件可以使用PDFBox库来读取。

示例代码:

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;

import java.io.File;
import java.io.IOException;

public class PDFReader {
    public static void readPDF(String filePath) throws IOException {
        PDDocument document = PDDocument.load(new File(filePath));
        PDFTextStripper pdfStripper = new PDFTextStripper();
        String text = pdfStripper.getText(document);
        System.out.println(text);
        document.close();
    }
}

4. 读取文本文件

对于文本文件，可以使用Java自带的BufferedReader。

示例代码:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class TextReader {
    public static void readText(String filePath) throws IOException {
        BufferedReader br = new BufferedReader(new FileReader(filePath));
        String line;
        while ((line = br.readLine()) != null) {
            System.out.println(line);
        }
        br.close();
    }
}

5. 万能读取方法

我们可以创建一个万能方法，根据文件的扩展名来调用相应的读取方法。

示例代码:

public class UniversalFileReader {
    public static void readFile(String filePath) {
        try {
            if (filePath.endsWith(".xlsx") || filePath.endsWith(".xls")) {
                ExcelReader.readExcel(filePath);
            } else if (filePath.endsWith(".docx") || filePath.endsWith(".doc")) {
                WordReader.readWord(filePath);
            } else if (filePath.endsWith(".pdf")) {
                PDFReader.readPDF(filePath);
            } else if (filePath.endsWith(".txt")) {
                TextReader.readText(filePath);
            } else {
                System.out.println("不支持的文件类型");
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) {
        String filePath = "your_file_path_here"; // 这里替换成你的文件路径
        readFile(filePath);
    }
}

总结

通过以上几段代码，我们实现了使用Java读取Excel、Word、PDF和文本文件的功能。这些示例展示了如何利用Apache POI和PDFBox等库来高效地处理不同格式的文件，为日常开发提供了便利。希望本文能够帮助开发者在项目中更轻松地处理文件操作。

点赞(0) 打赏

本文分类：后端
本文标签：java 开发语言
浏览次数：18 次浏览
发布日期：2024-10-10 07:31:35
本文链接：http://makehui.com/houduan/4921.html

上一篇 > 今年Java回暖了吗
下一篇 > Java使用PaddleOCR，这可能是Java目前最通用的OCR

Java 读取 Excel、docx、pdf 和 txt 等文件万能方法！

1. 读取 Excel 文件

2. 读取 Word 文件

3. 读取 PDF 文件

4. 读取文本文件

5. 万能读取方法

总结

SpringBoot项目中读取resource目录下的文件（六种方法）

Spring RestTemplate 升级 WebClient 导致 OutOfMemoryError

Python 中的 Global 和 Nonlocal 关键字（python global声明全局变量、python nonlocal声明外层非全局作用域变量，主要用在嵌套函数中）模块作用域

画个心，写个花！Python Turtle库带你玩转创意绘图！