How can I extract data from a dynamically generated table using Puppeteer?
Benjamin C
benjamin c profile pic

To extract data from a dynamically generated table using Puppeteer, you can follow these steps: 1. Launching a new browser instance and creating a new page:

1
2
3
4
5
6
7
8
9
10
11
12
13

   const puppeteer = require('puppeteer');

   (async () => {
     const browser = await puppeteer.launch();
     const page = await browser.newPage();

     // Perform actions with the page here

     // Close the browser
     await browser.close();
   })();
   

This code sets up a basic Puppeteer script. It launches a new headless browser instance and creates a new page to work with. 2. Navigating to the page with the dynamically generated table: Usepage.goto() to navigate to the web page that contains the dynamically generated table.

1
2
3

   await page.goto('https://example.com');
   

Replace'https://example.com' with the URL of the page that contains the table. 3. Waiting for the table to be dynamically generated: If the table is generated dynamically and takes some time to load, usepage.waitFor() orpage.waitForSelector() to wait for the table element to become available.

1
2
3
4

   const tableSelector = '#tableId';
   await page.waitForSelector(tableSelector);
   

Replace'tableId' with the actual ID or selector of the table element. 4. Extracting data from the table: Usepage.$$eval() orpage.evaluate() to execute custom JavaScript code within the page's context and extract data from the table. - Usingpage.$$eval():

1
2
3
4
5
6
7
8

     const tableRows = await page.$$eval(`${tableSelector} tr`, (rows) => {
       return Array.from(rows, (row) => {
         const columns = row.querySelectorAll('td');
         return Array.from(columns, (column) => column.textContent.trim());
       });
     });
     

This code usespage.$$eval() to select all table rows (tr elements) within the table and extract the text content of each cell (td elements). The data is stored in thetableRows variable as a two-dimensional array, where each inner array represents a row of the table. - Usingpage.evaluate():

1
2
3
4
5
6
7
8
9
10
11

     const tableRows = await page.evaluate((tableSelector) => {
       const table = document.querySelector(tableSelector);
       const rows = Array.from(table.querySelectorAll('tr'));

       return rows.map((row) => {
         const columns = Array.from(row.querySelectorAll('td'));
         return columns.map((column) => column.textContent.trim());
       });
     }, tableSelector);
     

This code utilizespage.evaluate() to execute custom JavaScript code within the page's context. It finds the table using the provided selector, selects all rows (tr elements) within the table, and extracts the text content of each cell (td element). The data is returned as a two-dimensional array, similar to the previous approach. By using eitherpage.$$eval() orpage.evaluate(), you can extract data from the dynamically generated table in Puppeteer. The extracted data can be further processed, saved to a file, or used for analysis and automation purposes based on your specific requirements.