Managing PDFs in Node.js with pdf-lib

In today's digital age, Portable Document Format (PDF) files have become an integral part of how we share and manage documents. Regardless of whether it's contracts, reports, or presentations, PDFs provide a consistent and reliable way to present information across different platforms and devices. As developers, being able to manipulate and generate PDFs programmatically is a valuable skill. This is where the pdf-lib library for Node.js comes into play. Node.js, with its asynchronous and event-driven architecture, is a popular choice for building server-side applications. When it comes to handling PDFs in a Node.js application, the pdf-lib library offers a powerful and flexible solution. In this comprehensive guide, we will delve into the world of managing PDFs using pdf-lib, exploring its features, capabilities, and how to integrate it into your Node.js projects.

In this article, we will cover the following:

Installing and setting up pdf-lib in a Node.js environment.
Creating a new PDF document from scratch or modifying an existing one.
Adding text, images, and other content to PDF pages.
Working with fonts and text styles to create visually appealing documents.
Extracting text and images from PDFs.
Merging multiple PDFs or splitting a single PDF into multiple files.
Adding interactive elements, such as hyperlinks and form fields.
Encrypting and securing PDF documents.
Optimizing and compressing PDFs for efficient storage and sharing.

Installing pdf-lib in a Node.js environment

Before we can start working with the pdf-lib library, we need to set up our Node.js environment and install the library.

Prerequisites

To get started, you'll need the following:

Node.js: Make sure you have Node.js installed on your machine. You can download it from the official Node.js website.
npm: This Node.js package manager allows you to install and manage libraries. It's usually included with a Node.js installation.

Installing pdf-lib

Once you have Node.js and npm set up, installing pdf-lib is a straightforward process. Open your terminal or command prompt and navigate to your project's directory. Then, execute the following command to install pdf-lib:

npm install pdf-lib

This command will download the pdf-lib package from the npm registry and add it to your project's node_modules directory. You can now start using pdf-lib in your Node.js application.

Importing pdf-lib

To use pdf-lib in your project, import it into your code files. Open the JavaScript file and add the following line at the beginning:

const { PDFDocument, rgb } = require('pdf-lib');

In this import statement, we're importing the PDFDocument class and the rgb function from the pdf-lib package. The PDFDocument class is the cornerstone of pdf-lib, allowing you to create, modify, and manipulate PDF documents. The rgb function helps you define colors in the Red-Green-Blue (RGB) format, which is commonly used for specifying colors in PDFs. With pdf-lib imported, you're now ready to start working with PDFs in your Node.js application.

Creating a new PDF document

Creating a new PDF document using pdf-lib is straightforward. You can start by creating a new instance of the PDFDocument class. Here's a basic example of how to create a new PDF document and add a blank page:


const { PDFDocument, rgb } = require('pdf-lib');

async function createPDF() {
  const pdfDoc = await PDFDocument.create();
  const page = pdfDoc.addPage([600, 400]);

  const content = pdfDoc
    .getPages()
    .map((page) => page.drawText('Hello, pdf-lib!', { x: 50, y: 300 }));

  const pdfBytes = await pdfDoc.save();
  return pdfBytes;
}

createPDF().then((pdfBytes) => {
  // `pdfBytes` contains the bytes of the generated PDF
});

In this example, we're using the PDFDocument.create() method to create a new PDF document. We then add a blank page to the document using the addPage() method and specify the page dimensions.

Adding content to PDF pages

pdf-lib allows you to add various types of content to PDF pages, including text, images, shapes, and more. Let's take a look at how to add text and images to a PDF page:


async function addContentToPDF() {
  const pdfDoc = await PDFDocument.create();
  const page = pdfDoc.addPage([600, 400]);

  // Adding text
  const textOptions = { x: 50, y: 300, size: 24, color: rgb(0, 0, 0) };
  page.drawText('Hello, pdf-lib!', textOptions);

  // Adding an image
  const imageUrl = 'path/to/your/image.png';
  const image = await pdfDoc.embedPng(fs.readFileSync(imageUrl));
  const imageDims = image.scale(0.5);
  page.drawImage(image, {
    x: 100,
    y: 200,
    width: imageDims.width,
    height: imageDims.height,
  });

  const pdfBytes = await pdfDoc.save();
  return pdfBytes;
}

addContentToPDF().then((pdfBytes) => {
  // `pdfBytes` contains the bytes of the PDF with added content
});

In this example, we're using the drawText() method to add text to the PDF page. We're also embedding a PNG image using the embedPng() method and then drawing the image on the page using the drawImage() method.

Modifying existing PDFs with pdf-lib

As your PDF manipulation needs evolve, you'll often find yourself needing to modify existing PDF documents. pdf-lib offers a versatile set of tools to help you efficiently make changes to PDFs. In this section, we'll dive into more detail about how to load, edit, and save modifications to existing PDFs using pdf-lib.

Loading an existing PDF

Before you can modify an existing PDF, you need to load it into a PDFDocument instance. To do this, you'll use the PDFDocument.load() method. This method asynchronously loads a PDF from a buffer or file and returns a PDFDocument instance you can use.

const fs = require('fs');
const { PDFDocument } = require('pdf-lib');

async function modifyExistingPDF() {
  const existingPdfBytes = fs.readFileSync('path/to/existing.pdf');
  const pdfDoc = await PDFDocument.load(existingPdfBytes);

  // Now you can perform modifications on the pdfDoc
}

Modifying pages

Once you've loaded an existing PDF, you can modify its pages by adding or editing content. For example, you can add text, images, shapes, or annotations to pages. Here's a simple example that adds a watermark text to each page:

async function addWatermark(pdfDoc, watermarkText) {
  const pages = pdfDoc.getPages();

  for (const page of pages) {
    const { width, height } = page.getSize();
    const textWidth = watermarkText.length * 10; // Adjust text positioning

    page.drawText(watermarkText, {
      x: (width - textWidth) / 2,
      y: height / 2,
      size: 30,
      color: rgb(0.7, 0.7, 0.7),
    });
  }
}

Saving a modified PDF

After making the desired modifications to the PDF document, you need to save the changes. Use the pdfDoc.save() method to generate a new PDF buffer with the modifications and optionally save it to a file.

async function saveModifiedPDF(pdfDoc, outputPath) {
  const modifiedPdfBytes = await pdfDoc.save();
  fs.writeFileSync(outputPath, modifiedPdfBytes);
}

Removing pages

pdf-lib also allows you to remove pages from an existing PDF. This can be useful if you want to extract specific pages or simply remove unnecessary ones:

function removePage(pdfDoc, pageIndex) {
  const pages = pdfDoc.getPages();
  if (pageIndex >= 0 && pageIndex < pages.length) {
    pdfDoc.removePage(pageIndex);
  }
}

Rearranging pages

Rearranging pages in a PDF document is achievable by manipulating the order of page objects in the PDFDocument. You can use the pdfDoc.movePage() method to move a page to a specific position within the document:


function movePage(pdfDoc, sourceIndex, targetIndex) {
  const pages = pdfDoc.getPages();
  if (sourceIndex >= 0 && sourceIndex < pages.length && targetIndex >= 0 && targetIndex <= pages.length) {
    pdfDoc.movePage(sourceIndex, targetIndex);
  }
}

Advanced PDF manipulation with pdf-lib

Having established a solid foundation in creating and modifying PDFs, let's explore some advanced features of pdf-lib that enable you to take your PDF manipulation skills to the next level. From working with fonts to adding interactive elements and securing your documents, pdf-lib offers a comprehensive toolkit for all your PDF-related needs.

Working with fonts and text styles

Fonts play a crucial role in the appearance of text within a PDF. pdf-lib allows you to embed custom fonts and change text colors, sizes, and styles, providing full control over the typography of your PDF documents.


// Loading a custom font
const fontBytes = fs.readFileSync('path/to/font.ttf');
const customFont = await pdfDoc.embedFont(fontBytes);

// Adding styled text
const page = pdfDoc.addPage();
const textOptions = {
  x: 50,
  y: 300,
  size: 18,
  font: customFont,
  color: rgb(0, 0, 0),
};
page.drawText('Styled text with custom font', textOptions);

Extracting content from PDFs

pdf-lib also enables you to extract text, images, and other content from existing PDFs. This can be useful for extracting data, analyzing content, or repurposing information.

async function extractContentFromPDF() {
  const existingPdfBytes = fs.readFileSync('path/to/existing.pdf');
  const pdfDoc = await PDFDocument.load(existingPdfBytes);

  const textContent = await pdfDoc.extractText();
  console.log('Extracted Text:', textContent);

  const firstPage = pdfDoc.getPages()[0];
  const images = firstPage.getImages();
  console.log('Extracted Images:', images);

  // You can save images to files or process them further
}

extractContentFromPDF();

We start by loading an existing PDF document using the PDFDocument.load() method. Replace 'path/to/existing.pdf' with the path to the PDF file from which you want to extract content. Once the PDF is loaded, we use the extractText() method to extract the text from the entire document. The extracted text is stored in the textContent variable. Next, we retrieve the first page of the PDF using the getPages()[0] method. This gives us a reference to the first page of the PDF document. Using the getImages() method on the first page, we extract all the images present on that page. The extracted images are stored in the images array. The code then logs the extracted text content and images to the console. You can further process or manipulate this data as needed.

Merging and splitting PDFs

pdf-lib simplifies the process of merging multiple PDFs into a single document or splitting a single PDF into multiple files.

async function mergePDFs(pdfPaths) {
  const mergedPdfDoc = await PDFDocument.create();

  for (const pdfPath of pdfPaths) {
    const pdfBytes = fs.readFileSync(pdfPath);
    const pdf = await PDFDocument.load(pdfBytes);
    const copiedPages = await mergedPdfDoc.copyPages(pdf, pdf.getPageIndices());
    copiedPages.forEach((page) => mergedPdfDoc.addPage(page));
  }

  const mergedPdfBytes = await mergedPdfDoc.save();
  fs.writeFileSync('merged.pdf', mergedPdfBytes);
}

We begin by defining an asynchronous function named mergePDFs. This function takes an array called pdfPaths as its parameter. This array should contain the file paths of the PDFs that need to be merged.

Inside the function, we create an instance of a PDF document named mergedPdfDoc using the PDFDocument.create() method. This document will ultimately hold the merged content of the PDFs.

We then initiate a loop that iterates through each PDF file path in the pdfPaths array. For each path, we perform the following steps.

We read the content of the PDF file using the fs.readFileSync(pdfPath) method, where pdfPath is the current PDF file path in the loop. The read PDF content is stored as bytes in the pdfBytes variable.

Using the PDFDocument.load(pdfBytes) method, we load the PDF document from the pdfBytes. This loaded PDF is assigned to the pdf variable.

We use the copyPages(pdf, pdf.getPageIndices()) method on the mergedPdfDoc to copy all the pages from the pdf into the mergedPdfDoc. This method returns an array of copied page references, which we store in the copiedPages variable.

We iterate through the copiedPages array using the forEach loop. For each page reference in the array, we add that page to the mergedPdfDoc using the mergedPdfDoc.addPage(page) method. This effectively combines the pages from the current PDF into the merged PDF document.

After processing all PDFs in the loop, we proceed with the following steps.

We use the mergedPdfDoc.save() method to generate a byte array containing the merged PDF document's content. This byte array is stored in the mergedPdfBytes variable.

The fs.writeFileSync('merged.pdf', mergedPdfBytes) line writes the mergedPdfBytes byte array to a new PDF file named 'merged.pdf'. This file contains the merged content of all the input PDFs.

Adding interactive elements

pdf-lib empowers you to create interactive PDFs by adding hyperlinks, form fields, and other interactive elements.

const link = pdfDoc.createPageLink([0, 0, 200, 100], { pageNumber: 1 });
page.drawText('Click here', { x: 50, y: 50, size: 12, link });

const form = pdfDoc.getForm();
const textField = form.createTextField('myTextField');
textField.setText('User input here');
page.drawText('Enter text:', { x: 50, y: 250, size: 12 });
textField.addToPage(page, { x: 150, y: 240 });

We create a page link with the createPageLink() method. The first argument [0, 0, 200, 100] defines the clickable area's position and size (x, y, width, height), and the second argument { pageNumber: 1 } specifies that the link leads to page 1 of the PDF document. The resulting link is stored in the link variable.

Next, we add a clickable area with the text "Click here" on the page. The text is positioned at coordinates { x: 50, y: 50 } with a font size of 12. The link parameter makes the text area clickable.

We proceed by obtaining the PDF's form using pdfDoc.getForm(). Within the form, we create a text field named 'myTextField'.

The text field's default text is set to 'User input here' using the setText() method.

Further, we add the label "Enter text:" to the page at coordinates { x: 50, y: 250 } using the drawText() method. The textField.addToPage() method places the previously created text field on the page at coordinates { x: 150, y: 240 }.

Securing PDF documents

pdf-lib allows you to encrypt and password-protect PDF documents, adding a layer of security to sensitive information.

pdfDoc.encrypt({
  userPassword: 'user123',
  ownerPassword: 'owner456',
  permissions: {
    print: 'lowResolution',
    copy: false,
    modify: true,
    annotating: true,
  },
});

Conclusion

In this comprehensive guide, we've embarked on a journey through the realm of PDF manipulation in Node.js using the powerful pdf-lib library. We've covered a wide range of topics, starting from the basics of installation and setup, all the way to advanced techniques for creating, modifying, and enhancing PDF documents programmatically. With pdf-lib, you've gained a powerful tool that empowers you to take control of PDF documents in your Node.js applications. Regardless of whether you're automating document generation, enhancing interactivity, or securing content, pdf-lib equips you with the capabilities needed to handle a wide range of PDF-related tasks programmatically. Thanks for reading!