How do I handle infinite scroll with pagination in Puppeteer?
Davide S
davide s profile pic

Handling infinite scroll with pagination in Puppeteer involves automating the scrolling and pagination process to load and extract data from multiple pages. Here's a detailed explanation of how to handle infinite scroll with pagination in Puppeteer: 1. Launching a new browser instance and creating a new page:

1
2
3
4
5
6
7
8
9
10
11
12
13

   const puppeteer = require('puppeteer');

   (async () => {
     const browser = await puppeteer.launch();
     const page = await browser.newPage();

     // Perform actions with the page here

     // Close the browser
     await browser.close();
   })();
   

This code sets up a basic Puppeteer script. It launches a new headless browser instance and creates a new page to work with. 2. Handling infinite scroll with pagination: To handle infinite scroll with pagination, you need to follow these steps: - Identify the scrollable container: Identify the scrollable container element on the page that triggers the loading of more content when scrolled to the bottom. This container is usually the element that receives scroll events. - Automate scrolling: Use Puppeteer'spage.evaluate() function to execute JavaScript code within the page's context and scroll the container to trigger the loading of more content. You can scroll to the bottom of the container or scroll by a specific height. - Wait for content to load: After scrolling, you need to wait for the newly loaded content to appear on the page. Usepage.waitForSelector() orpage.waitForXPath() to wait for a specific element or XPath expression to become available, indicating that the content has been loaded. - Extract data from the loaded content: Once the content has been loaded, extract the desired data using Puppeteer's DOM manipulation methods or evaluate JavaScript code within the page's context usingpage.$$eval() orpage.evaluate(). - Repeat the process for pagination: If the infinite scroll involves pagination, locate and interact with the pagination elements to navigate to the next page. Repeat the scrolling and data extraction process for each page until there are no more pages.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

   async function handleInfiniteScrollWithPagination() {
     const scrollableContainerSelector = '#scrollableContainer';
     const contentSelector = '.contentElement';
     const nextPageSelector = '.nextPageButton';

     while (true) {
       await page.evaluate((containerSelector) => {
         const container = document.querySelector(containerSelector);
         container.scrollTo(0, container.scrollHeight);
       }, scrollableContainerSelector);

       await page.waitForSelector(contentSelector);

       // Extract and process the data from the loaded content
       const data = await page.$$eval(contentSelector, (elements) => {
         return elements.map((element) => element.textContent.trim());
       });

       // Perform actions with the extracted data
       console.log('Extracted data:', data);

       const nextPageButton = await page.$(nextPageSelector);

       if (!nextPageButton) {
         // No more pages to load, exit the loop
         break;
       }

       // Click the next page button to navigate to the next page
       await nextPageButton.click();
       await page.waitForNavigation();
     }
   }

   await handleInfiniteScrollWithPagination();
   

In this example, thehandleInfiniteScrollWithPagination() function is defined to handle the infinite scroll with pagination process. It contains the necessary steps mentioned above: - Scrolling the container element usingpage.evaluate(). - Waiting for the content to load usingpage.waitForSelector(). - Extracting the data from the loaded content usingpage.$$eval(). - Checking for the existence of the next page button and clicking it to navigate to the next page usingpage.$() andpage.waitForNavigation(). By utilizing this approach, you can handle infinite scroll with pagination in Puppeteer. This enables you to automate the process of scrolling, loading content, and extracting data from multiple pages with infinite scroll behavior and pagination.