1. Overview
1.概述
In this article, we’ll see how to edit the content of an existing PDF file in Java. First, we’ll just add new content. Then, we’ll focus on removing or replacing some pre-existing content.
在这篇文章中,我们将看到如何在Java中编辑一个现有的PDF文件的内容。首先,我们将只是添加新的内容。然后,我们将专注于删除或替换一些预先存在的内容。
2. Adding the iText7 Dependency
2.添加iText7的依赖关系
We’ll use the iText7 library to add content to the PDF file. Later on, we’ll use the pdfSweep add-on to remove or replace content.
我们将使用iText7库来向PDF文件添加内容。随后,我们将使用pdfSweep插件来删除或替换内容。
Note that iText is licensed under AGPL, which might limit the distribution of a commercial application: iText License Model.
请注意,iText是根据AGPL授权的,这可能会限制商业应用的发行。iText许可模式。。
First, let’s add these dependencies to our pom.xml:
首先,让我们把这些依赖项添加到我们的pom.xml中。
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itext7-core</artifactId>
<version>7.2.3</version>
<type>pom</type>
</dependency>
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>cleanup</artifactId>
<version>3.0.1</version>
</dependency>
3. File Handling
3.文件处理
Let’s understand the steps for handling our PDF with iText7:
让我们了解用iText7处理我们的PDF的步骤。
- First, we open a PdfReader to read the content of the source file. This throws an IOException if an error occurs at any time while reading the file.
- Then, we open a PdfWriter to the destination file. If this file doesn’t exist or can’t be created, a FileNotFoundException is thrown.
- After that, we’ll open a PdfDocument which uses our PdfReader and PdfWriter.
- Finally, closing the PdfDocument closes both the underlying PdfReader and PdfWriter.
Let’s write a main() method that runs our whole treatment. For the sake of simplicity, we’ll just rethrow any Exception that could occur:
让我们写一个main()方法来运行我们的整个处理。为了简单起见,我们将只是重新抛出任何可能发生的Exception。
public static void main(String[] args) throws IOException {
PdfReader reader = new PdfReader("src/main/resources/baeldung.pdf");
PdfWriter writer = new PdfWriter("src/main/resources/baeldung-modified.pdf");
PdfDocument pdfDocument = new PdfDocument(reader, writer);
addContentToDocument(pdfDocument);
pdfDocument.close();
}
In the following section, we’ll complete step-by-step the addContentToDocument() method in order to fill our PDF with new content. The source document’s a PDF file that only contains the text “Hello Baeldung“ on the top left. The destination file will be created by the program.
在下一节中,我们将逐步完成addContentToDocument()方法,以便用新的内容填充我们的PDF。源文件是一个PDF文件,只包含左上方的文字 “Hello Baeldung“。目标文件将由程序创建。
4. Adding Content to the File
4.向文件中添加内容
We’ll now add various types of content to the file.
我们现在要在文件中添加各种类型的内容。
4.1. Adding a Form
4.1.添加一个表格
We’ll start by adding a form to the file. Our form will be very simple and contain a unique field called name.
我们将首先在文件中添加一个表单。我们的表单将非常简单,包含一个名为name的独特字段。
Furthermore, we need to tell iText where to place the field. In this case, we’ll put it at the following point: (35,400). The coordinates (0,0) refer to the bottom left of the document. Lastly, we’ll set the dimension of the field to 100×30:
此外,我们需要告诉iText将这个字段放在哪里。在这种情况下,我们将把它放在以下位置。(35,400)。坐标(0,0)指的是文档的左下方。最后,我们将设置该区域的尺寸为100×30。
PdfFormField personal = PdfFormField.createEmptyField(pdfDocument);
personal.setFieldName("information");
PdfTextFormField name = PdfFormField.createText(pdfDocument, new Rectangle(35, 400, 100, 30), "name", "");
personal.addKid(name);
PdfAcroForm.getAcroForm(pdfDocument, true)
.addField(personal, pdfDocument.getFirstPage());
Additionally, we’ve explicitly specified iText to add the form to the first page of the document.
此外,我们已经明确指定iText将表单添加到文档的第一页。
4.2. Adding a New Page
4.2.添加一个新的页面
Let’s now have a look at how we can add a new page to the document. We’ll use the addNewPage() method.
现在让我们来看看我们如何在文档中添加一个新的页面。我们将使用addNewPage()方法。
This method can accept the index of the added page if we want to specify it. For instance, we can add a new page at the beginning of the document:
如果我们想指定的话,这个方法可以接受所添加页面的索引。例如,我们可以在文档的开头添加一个新页。
pdfDocument.addNewPage(1);
4.3. Adding an Annotation
4.3.添加一个注释
We’ll now want to add an annotation to the document. Concretely, an annotation looks like a squared comic bubble.
我们现在要在文档中添加一个注释。具体来说,注释看起来像一个方形的漫画气泡。
We’ll add it on top of the form that’s now located on the second page of the document. Consequently, we’ll place it on the coordinates (40,435). Additionally, we’ll give it a simple name and content. These will only show up when hovering over the annotation:
我们将把它添加到现在位于文件第二页的表单上面。因此,我们将把它放在坐标(40,435)上。此外,我们将给它一个简单的名字和内容。这些将只在悬停在注释上时显示出来。
PdfAnnotation ann = new PdfTextAnnotation(new Rectangle(40, 435, 0, 0)).setTitle(new PdfString("name"))
.setContents("Your name");
pdfDocument.getPage(2)
.addAnnotation(ann);
Here’s how the middle of our second page now looks:
下面是我们的第二页的中间部分现在的样子。
4.4. Adding an Image
4.4.添加一个图像
From now on, we’ll add layout elements to the page. In order to do this, we won’t be able to manipulate the PdfDocument directly anymore. We’ll rather create a Document from it and work with that. Moreover, we’ll need to close the Document in the end. Closing a Document automatically closes the base PdfDocument. So we could remove the part where we closed the PdfDocument earlier:
从现在开始,我们将在页面中添加布局元素。为了做到这一点,我们将不能再直接操作PdfDocument。我们将从它那里创建一个Document,然后用它工作。此外,我们需要在最后关闭这个Document。关闭一个Document会自动关闭基础PdfDocument。所以我们可以删除之前关闭PdfDocument的部分。
Document document = new Document(pdfDocument);
// add layout elements
document.close();
Now, to add the image, we’ll need to load it from its location. We’ll do this using the create() method of the ImageDataFactory class. This throws a MalformedURLException if the passed file URL can’t be parsed. In this example, we’ll use an image of Baeldung’s logo placed in the resources directory:
现在,为了添加图片,我们需要从它的位置加载它。我们将使用ImageDataFactory类的create()方法来完成。如果传递的文件URL不能被解析,这将抛出一个MalformedURLException。在这个例子中,我们将使用放置在资源目录中的Baeldung的标志图片。
ImageData imageData = ImageDataFactory.create("src/main/resources/baeldung.png");<br/>
ImageData imageData = ImageDataFactory.create("src/main/resources/baeldung.png"); <br/>
The next step will be to set the image’s properties in the file. We’ll set its size to 550×100. We’ll put it on the first page of our PDF, at the (10,50) coordinates. Let’s see the code to add the image:
下一步将是在文件中设置图像的属性。我们将设置其大小为550×100。我们将把它放在我们PDF的第一页,在(10,50)坐标处。让我们看看添加图片的代码。
Image image = new Image(imageData).scaleAbsolute(550,100)
.setFixedPosition(1, 10, 50);
document.add(image);
The image is automatically rescaled to the given size. So here’s how it looks in the document:
图像会自动重新调整到给定的尺寸。所以这里是它在文档中的样子。
4.5. Adding a Paragraph
4.5.添加一个段落
The iText library brings some tools to add text to the file. The font can be parameterized on the pieces themselves, or directly on the Paragraph element.
iText库带来了一些向文件添加文本的工具。字体可以在作品本身上设置参数,或者直接在Paragraph元素上设置。
For instance, let’s add the following sentence on top of the first page: This is a demo from Baeldung tutorials. We’ll set the font size of the beginning of this sentence to 16 and the global font size of Paragraph to 8:
例如,让我们在第一页的顶部添加以下句子。这是一个来自Baeldung教程的演示。我们将把这句话开头的字体大小设置为16,把Paragraph的全局字体大小设置为8。
Text title = new Text("This is a demo").setFontSize(16);
Text author = new Text("Baeldung tutorials.");
Paragraph p = new Paragraph().setFontSize(8)
.add(title)
.add(" from ")
.add(author);
document.add(p);
4.6. Adding a Table
4.6.添加一个表
Last but not least, we can also add a table to the file. For example, we’ll define a double-entry table with two cells and two headers on top of them. We won’t specify any position. So it’ll be naturally added on top of the document, right after the Paragraph we just added:
最后但并非最不重要的是,我们还可以在文件中添加一个表格。例如,我们将定义一个有两个单元格和两个标题的复式表格。我们将不指定任何位置。所以它将被自然地添加到文件的顶部,就在我们刚刚添加的Paragraph之后。
Table table = new Table(UnitValue.createPercentArray(2));
table.addHeaderCell("#");
table.addHeaderCell("company");
table.addCell("name");
table.addCell("baeldung");
document.add(table);
Let’s see the beginning of the first page of the document now:
现在让我们看看文件的第一页的开头。
5. Removing Content From the File
5.从文件中删除内容
Let’s now see how we can remove content from the PDF file. To keep things simple, we’ll write another main() method.
现在让我们看看我们如何从PDF文件中删除内容。为了保持简单,我们将再写一个main()方法。
Our source PDF file will be the baeldung-modified.pdf file and the destination will be a new baeldung-cleaned.pdf file. We’ll work directly on the PdfDocument object. From now on, we’ll use iText’s pdfSweep add-on.
我们的源PDF文件将是baeldung-modified.pdf文件,目标将是一个新的baeldung-cleaned.pdf文件。我们将直接在PdfDocument对象上工作。从现在开始,我们将使用iText的pdfSweep插件。
5.1. Removing Text From the File
5.1.从文件中删除文本
To remove a given text from the file, we’ll need to define a cleanup strategy. In this example, the strategy will simply be to find all text matching Baeldung. The last step is to call the autoSweepCleanUp() static method of PdfCleaner. This method will create a custom PdfCleanUpTool which will throw an IOException if any error happens during file handling:
为了从文件中删除给定的文本,我们需要定义一个清理策略。在这个例子中,这个策略将简单地找到所有与Baeldung匹配的文本。最后一步是调用PdfCleaner的autoSweepCleanUp() static方法。这个方法将创建一个自定义的PdfCleanUpTool,如果在文件处理过程中发生任何错误,它将抛出IOException。
CompositeCleanupStrategy strategy = new CompositeCleanupStrategy();
strategy.add(new RegexBasedCleanupStrategy("Baeldung"));
PdfCleaner.autoSweepCleanUp(pdfDocument, strategy);
As we can see, the occurrences of the Baeldung word in the source file are overlayed with a black rectangle in the result file. This behavior is suitable, for instance, for data anonymization:
我们可以看到,在源文件中出现的Baeldung词在结果文件中被叠加成一个黑色矩形。这种行为适合于,例如,数据匿名化。
5.2. Removing Other Content From the File
5.2.从文件中删除其他内容
Unfortunately, it’s very difficult to detect any non-text content in the file. However, pdfSweep offers the possibility to erase the content of a portion of the file. Thus, if we know where the content we want to remove is located, we’ll be able to take advantage of this possibility.
不幸的是,我们很难发现文件中的任何非文本内容。然而,pdfSweep提供了擦除文件中一部分内容的可能性。因此,如果我们知道我们想要删除的内容的位置,我们就能利用这种可能性了。
As an example, we’ll erase the content of the rectangle of size 100×35 located at (35,400) on the second page. This means we’ll get rid of all the content of the form and the annotation. Furthermore, we’ll erase the rectangle of size 90×70 located at (10,50) of the first page. This basically removes the B from Baeldung’s logo. Using the PdfCleanUpTool class, here’s the code to do all that:
作为一个例子,我们将擦除位于第二页(35,400)的大小为100×35的矩形的内容。这意味着我们将删除表单和注释的所有内容。此外,我们将擦除位于第一页(10,50)处的大小为90×70的矩形。这基本上是把Baeldung的标志中的B删除。使用PdfCleanUpTool类,下面是做这些事情的代码。
List<PdfCleanUpLocation> cleanUpLocations = Arrays.asList(new PdfCleanUpLocation(1, new Rectangle(10, 50, 90,70)), new PdfCleanUpLocation(2, new Rectangle(35, 400, 100, 35)));
PdfCleanUpTool cleaner = new PdfCleanUpTool(pdfDocument, cleanUpLocations, new CleanUpProperties());
cleaner.cleanUp();
We can now see the following image in baeldung-cleaned.pdf:
现在我们可以在baeldung-cleaned.pdf中看到以下图片。
6. Replacing Content in the File
6.替换文件中的内容
In this section, we’ll do the same work as earlier, except that we’ll replace the former text with a new text instead of only erasing it.
在这一节中,我们将做与前面相同的工作,只是我们将用一个新的文本替换以前的文本,而不是只擦除它。
For more clarity, we’ll use a new main() method again. Our source file will be the baeldung-modified.pdf file. Our destination file will be a new baeldung-fixed.pdf file.
为了更加清晰,我们将再次使用一个新的main()方法。我们的源文件将是baeldung-modified.pdf文件。我们的目标文件将是一个新的baeldung-fixed.pdf文件。
Earlier we saw that the removed text was overlayed with a black background. However, this color is configurable. As we know the background of the text is white in our file, we’ll force the overlay to be white. The beginning of the treatment will be similar to what we did earlier, except that we’ll search for the text Baeldung tutorials.
先前我们看到,被删除的文本是用黑色背景覆盖的。然而,这个颜色是可以配置的。因为我们知道在我们的文件中文本的背景是白色的,所以我们将强制覆盖为白色。处理的开始将与我们先前所做的类似,只是我们将搜索文本Baeldung tutorials。
However, after calling autoSweepCleanUp(), we’ll query the strategy to get the location of the removed code. We’ll then instantiate a PdfCanvas which will contain the replacement text HIDDEN. Additionally, we’ll remove the top margin to have it a bit better aligned with the original text. The default alignment is indeed not so good. Let’s look at the resulting code:
然而,在调用autoSweepCleanUp()之后,我们将查询策略以获得被删除的代码的位置。然后我们将实例化一个PdfCanvas,它将包含替换的文本HIDDEN。此外,我们将删除顶部的空白,使其与原始文本更好地对齐。默认的对齐方式确实不是那么好。让我们看一下结果代码。
CompositeCleanupStrategy strategy = new CompositeCleanupStrategy();
strategy.add(new RegexBasedCleanupStrategy("Baeldung").setRedactionColor(ColorConstants.WHITE));
PdfCleaner.autoSweepCleanUp(pdfDocument, strategy);
for (IPdfTextLocation location : strategy.getResultantLocations()) {
PdfPage page = pdfDocument.getPage(location.getPageNumber() + 1);
PdfCanvas pdfCanvas = new PdfCanvas(page.newContentStreamAfter(), page.getResources(), page.getDocument());
Canvas canvas = new Canvas(pdfCanvas, location.getRectangle());
canvas.add(new Paragraph("HIDDEN").setFontSize(8)
.setMarginTop(0f));
}
And we can have a look at the file:
我们可以看一下这个文件。
7. Conclusion
7.结语
In this tutorial, we’ve seen how to edit the content of a PDF file. We’ve seen that we can add new content, remove existing content, and even replace text in the original file with a new one.
在本教程中,我们已经看到如何编辑PDF文件的内容。我们已经看到,我们可以添加新的内容,删除现有的内容,甚至用新的内容替换原始文件中的文本。
As always, the code for this article can be found over on GitHub.
像往常一样,本文的代码可以在GitHub上找到over。