Detailed PuPpeteer entry tutorial


PuPpeteer is a Node library, which provides a set of APIs used to manipulate Chrome, a headless chrome browser ( Of course, you can also configure a UI, the default is not available). Since it is a browser, then we can handle PuPpeteer in the browser can be competent. In addition, PuPpeteer translated into Chinese is “puppet” means, so I will know that I will know that it is very convenient to manipulate. She realizes:

1) Generate web screenshots or PDF

2) Advanced crawler, can climb a large number of asynchronous rendering contents
3) analog keyboard input, form automatic commit, login web, etc. Implement UI Automation Test
4) Capture the timeline of the site to track your website, help analysis of the website performance problem

If you have used Phantomjs, you will find that they have a bit similar, However, PuPpeteer is the Chrome official team to maintain, saying that as the saying goes is “there is a mother’s person”, the prospect is better.

2, Operating Environment


Viewing the official API of PuPpeteer You will find full screen Async, AWAIT, etc., these are ES7 specification, so You need:

Nodejs version cannot be lower than V7.6.0, you need to support Async, Await.

    Requires the latest Chrome Driver, this time when you install PuPpete by NPM
 3,Basic usage   
First look at the official entry of Demo

Const PuPpeteer = Require (‘PuPpeteer’); (Async); () => {const Browser = AWAIT PUPPETEER.LAUNCH (); const page = Await Browser.newpage (); await page.goto (‘https://example.com’); await page.screenshot ({pat: ‘ EXAMPLE.PNG ‘}; await browser.close ();}) ();

The above code realizes the screenshot of the web page, first explain the above Row code:

Create a browser instance Browser Object
by PuPpeteer.Launch () first, then create a page with the Browser object
  and then Page.goto () Jump to the specified page  
Call page.screenshot () screenshots
Close the browser

    Is it easy to think? Anyway, I think it is simpler than Phantomjs, as for the selenium-webdriver, then don’t say it. Let’s introduce a commonly used API of PuPpeteer.
  1. 3.1 PuPpeteer.Launch (options)
  2. Run PuPpeteer with PuPpeteer.launch (), it will return ReturnA promise, use the THEN method to get the browser instance, of course, the high version of NodeJS has supported the AWAIT feature, so the above example uses the AWAIT keyword, this requires special instructions, PuPpeteer almost all operations are asynchronous, in order to use A large number ofhens makes the readability of the code, all DEMO code this article is implemented with Async, AWAIT. This is also the official recommendation of PuPpeteer. For Async / AWAIT’s face, a classmate is a poke
  3. Options parameter detailed

Parameter name

Parameter type

Boolean

Does ignore HTTPS error information during the request, default is false (123) SlowMo Array (String) Handlesigint Boolean Allows the CHROME process through the process signal, that is, whether it can be used with Ctrl + C closed and exits the browser. TIMEOUT Number Waiting for the longest time of the Chrome instance start. The default is 30,000 (30 seconds). If it is incorporated into 0 DUMPIO Boolean Whether the browser process STDOUT and STDERR are imported into process.stdout and process .stderr. The default is false. UserDataDir String User data directory, the default Linux is in ~ / .config directory, Window defaults in c: \ user {user} \ appdata \ local \ Google \ chrome \ user data, where {user} represents the user name of the current login ENV Boolean 3.2 Browser Object const puPpeteer = Require (‘puppeteer’); PuPpeteer.launch (). Then (async Browser => {// Save Endpoint, so you can reconnect chromium const browserWsendPoint = Browser.WsendPoint (); // From Chromium Disconnect Browser.disconnect (); // Use endpoint to re-CHROMIUNM Connection const browser2 = await puPpeteer.connect ({BrowserWsendPoint}); // close chromium await browser2.close ();}); Browser.close ()
Headless Boolean Whether to run Chrome with “no head” mode, that is, the UI is not displayed, the default is true
String Road power of executable, PuPpeteer default is the use of Chrome WebDriver with it, if you want to specify a WebDriver path, You can set this parameter
Number Make PuPpeteer operations, units are milliseconds. This parameter will be very useful if you want to see the entire work process of PuPpeteer. Args
Other parameters of the Chrome instance, such as you can use “-ASH-Host-Window-Bounds = 1024×768 “to set the browser window size. More parameter parameters list can be refer to here
Object Specifies the environment variables visible to Chromium. The default is process.env. DEVTOOLS
Automatically opens the devTools panel for each tab, this option is only valid when Headless is set to False
When the PuPpete is connected to a chrome instance, a browser object is created, and two Method: Puppeteer.launch and PuPpeteer.Connect. Reconnect the browser instance after disconnection
Method Name Return value
Promise

Close browser

Browser.disconnect ()

Browser.Newpage ()
  Promise (PAGE)  
Create a Page instance
Browser.pages ()

Promise (Array)

Get all open Page instances Browser.targets () Get all acts Targets Browser.Version () promise (String) Get the version of the browser Ok, PuPpete’s API will not be introduced, and the detailed API provided by the official, poke here After I know, we can come to some actual combat, before this, let’s first understand the PuPpeteer design principle, simply PuPpeteer with WebDriver And Phantomjs’s largest difference is that it is standing in the user’s browsing, while WebDriver and PhantomJS initially designed to do automated testing, so it is designed to stand in the machine browsing, so they use different Design philosophy. Give a chestnut, join the homepage of Jingdong and make a product search, look at the implementation process using PuPpeteer and WebDriver: PuPpeteer implementation process: Open Jingdong Home Put the cursor FOCUS to search input box Keyboard Click to enter text Click Search button WebDriver’s implementation process: Open Jingdong Home Find the input box for the input box Set the value of the input box to search text to trigger the stand-alone event of the search button Screenshot of 10 mobile phone products in Jingdong Mall and screenshots for details page.
Browser.WsendPoint () Back Returns the Socket connection URL of the browser instance, you can pass this URL reconnecting Chrome instance
Personal feel PuPpeteer design philosophy Equivalence habits, more natural. Let’s implement PuPpeter’s entry learning with a simple demand. This simple demand is:
First let’s comb your operational process

Open Jingdong Home

Enter “mobile phone” keyword and search
before getting 10 items A tags, and get the HREF attribute value, get the product details link

Open 10 item details page, intercept web image

To achieve the above functional needs Using the lookup element, get attributes, keyboard events, etc., then let’s explain one by one.

4.1 Getting Element
  1. Page object provides two APIs to acquire page elements
  2. (1). Page. $ (Selector) Get a single Element, the bottom layer is called Document.QuerySelector (), so the selector’s Selector format follows the CSS selector specification
Let INPUTELEMENT = AWAIT PAGE. $ (“# Search “, Input =>Input); // The following is the equivalent let INPUTELEMENT = AWAIT PAGE. $ (‘# Search’);

(2). Page. $$ (selector) Get A set of elements, the underlying call is Document.QuerySelectoral (). Returns the array (Array (EleMetHandle)) element array.

  1. Const links = AWAIT PAGE. $$ (” A “); // The following method is equivalent const links = await page. $$ (” a “, links => links);
  2. The final return is ElemetHandle Object


PuPpeteer gets the logic of the elements attribute to write the QS of the JS of the previous section, according to the usual logic, it should be an element , Then get attributes of the element. But above we know that the ELEMETHANDLE object is finally returned, and you will find the ELEMETHANDLE API you will find that it doesn’t get an API attribute.

In fact, PuPpete is specially provided Get attribute API, page. $ Eval () and Page. $$ Eval ()

    (1). Page. $$ Eval (Selector, PageFunction [, … ARGS]), get the properties of a single element The selector Selector here is the same as above. $ (Selector).
  1. Const value = await page. $Eval (‘Input [Name = Search]’, Input => Input.Value; const href = await page. $ evAl (‘# a “, ele => ele.href; const content = await page. $ evAl ‘.content’, ELE => (
  2. 4.3 Performing a custom JS script
PAPPETEER PAGE The object provides a series of evAalog, you can perform some custom JS code through them, mainly providing the following three API

(1). Page.EVALUTE (PageFunction, … ARGS) Returns a Serialized normal objects, PageFunction means that the function to be executed on the page, args indicates the parameters of the PageFunction, the following PageFunction and Args are the same meaning.

const result = await page.evaluate (() => {return promise.resolve (8 * 7);}; console.log (result); //prints “56”

 This method is useful. For example, when we get a screenshot of the page, the default is only the size of the screenshot of the current browser window, the default is 800x600, then if we need to get a full screenshot of the entire page is not The way .page.screenshot () method provides parameters that set the screenshot area size, then we can solve this problem as long as we get the width and height of the page after the page is loaded.   
(async () => {const browser = await puPpeteer.launch ({headless: true}); const page = await browser.newpage (); await page.goto (‘https://jr.dayi35.com’); await page.setviewport ({Width: 1920, height: 1080}); const documentsize = await page.evaluate (() => {return {width: Document.documentelement .clientwidth: document.body.clientHeight,}}) AWAIT Page.screenshot ({Path: “Example.png”, Clip: {x: 0, Y: 0, Width: 1920, Height: Documentsize.Height}} ();
(2). Page.eValuateHandle (PageFunction, … args) performs a PageFunction in the Page Context, return JSHANDLE entity

const awndowhandle = AWAIT PAGE.EVALUATEHANDE (() => promise.resolve (window)); awindowhandle; // handle for the window Object.Const Ahandle = AWAIT Page.EvaluateHandle (‘Document’); // Handle for the ‘Document’.
  From the above code, Page.eValuateHandle () The method also returns the final processing result of Promise by the promise.resolve method, but the last returned object is packaged into a jshandle object. There is no difference in Evaluate in essence.  
The following code implements the dynamic (including JS dynamic insertion elements) HTML code.
Const Ahandle = AWAIT PAGE.EVALUATEHANDLE ( ) => Document.body; const resulthandle = await page.evaluateHandle (body => body.innerhtml, ahandle; console.log (Await ResultHandle.jsonValue ()); Await ResultHandle.dispose ();

(3). PageFunction, … ARGS, call PageFunction before the document page is loaded, if there is iframe or frame, the function called the context environment will become Subpage, ie iframe or frame, since the page is called before page loading, this function is generally used to initialize the JavaScript environment, such as reset or initialize some global variables.

4.4 PageIn addition to this above three APIs, there is a very useful API, which is page.exposefunction, this API is used to register a full-time function on the page, very useful:

Because some functions need to be used when you need to handle some operations, although you can define functions in page.evaluate () API, such as:

const docSize = await page.evaluate (() => {function getPageSize () {return {width: document.documentElement.clientWidth, height: document.body.clientHeight,}} return getPageSize ();});
   
But this function is not global, you need to redefine in each Evaluate, you can’t do code multiplex, in one is Nodejs, there are many toolkits that can be easy. Implementing a very complex function, for example, to implement the MD5 encryption function, this is not very convenient to use pure JS to implement, but use nodejs is a few lines of code.
The following code is implemented to the Window object to the PAGE context:

const puPpeteer = Require (‘PuPpeteer’); const crypto = request (‘ Crypto ‘); PuPpeteer.launch (). Then (async browser => {const page = await blowser.newpAge (); page.on (‘console’, msg => console.log (msg.text)); await page.exposefunction (‘md5’, text => crypto.createhash (‘md5’). Update (Text) .digest (‘HEX’)); await page.evaluate (async () => {// use window.md5 to compute has const myString = ‘PuPpeteer’; const myhash = AWAIT Window.md5; console.log (`Md5 of $ {mystring} is $ {myhash}`);}); await browser.close ();});


It can be seen that Page. The ExposeFunction API is very convenient, it is also very useful, in, for example, registration of the WINDOW object:
   const puPpeteer = Require ('PuPpeteer'); Const FS = Require ('fs'); PuPpeteer.launch (). Then (async browser => {const page = await browser.newpage (); page.on ('console', msg => console.log (msg.text) ); await page.exposefunction ('readfile', async filepath => {return new promiseResolve, reject) => {fs.readfile (FilePath, 'UTF8', (ERR, TEX) => {IF (Err) Reject (Err); Else Resolve (TEX);});});}); await Page.evaluate (async () => {// use window.readfile to read contents of a file const content = await window.readfile ('/ etc / hosts'); console.log (content);}; await browser 

5, Page.emulate Modify Simulator (Client) Run Configuration
  PuPpete is provided with some APIs to modify the configuration of the browser terminal  
page.setViewPort () Modify the browser window size

page.setUserage () Setting up the userAgent Information

page.emulatemedia () Change the CSS media type of the page for analog media simulation. Optional value is “Screen”, “Print”, “NULL”, if set to null, indicating the disabled media simulation.
Page.emulate () analog equipment, parameter equipment object, such as iPhone, Mac, Android, etc.
   
Page.SetViewPort ({Width: 1920, Height: 1080});// set the window size to 1920x1080page.setUserAgent ( ‘Mozilla / 5.0 (X11; Linux x86_64) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 60.0.3112.90 Safari / 537.36’); page.emulateMedia ( ‘print’); / / Set printer media style

In addition, we can also simulate non-PC equipment, such as the following code analog iPhone 6 Access Google:

const puppeteer = Require (‘PuPpeteer’); const devices = Require (‘PuPpeteer / DeviceDescriptors’); const iPhone = Devices [‘iPhone 6’]; puPpeteer.launch (). Then ( Async browser => {const page = await browser.newpage (); await page.emulate (iPhone); await page.goto (‘https://www.google.com’); // Other Actions … Await Browser
   
PuPpeteer supports many equipment simulation simulations, such as Galaxy, iPhone, iPad, etc., want to know detailed device support, please poke here DeviceDescriptors. JS.
6, keyboard and mouse


The keyboard and the mouse API is relatively simpleSingle, several APIs in the keyboard are as follows:

Keyboard.Down (key [, options]) triggers keydown event

Keyboard.Press (key [, options]) Press One key, key represents the name of the key, such as ‘arrowleft’ to the left button, detailed key name mapping please stamp

Keyboard.sendCharacter (CHAR) Enter a character

Keyboard.Type TEXT, OPTIONS) Enter a string
Keyboard.Up (key) triggered the keyup event
   
Page.Keyboard.Press (“Shift” ); // Press the SHIFT key page.keyboard.sendcharacter (”); page.keyboard.type (‘Hello’); // One input is completed Page.Keyboard.Type (‘World’, {Delay: 100} ); // simply enter

mouse.Click (x, y, [options]) Move mouse pointer to MOSE.CLICK (X, Y, [Options]) Move mouse pointer to Specify location, then press the mouse, this actually mouse.move and mouse.down or mouse.up shortcut
  mouse.down ([options]) triggered a mousedown event, Options configurable:  
Options.Button presses which key, optional value is [left, right, middle], the default is left, indicating the left mouse button Options.clickcount Press the number, click, double click, or other number of times

Delay button delay time

mouse.move (x, y, [ Options]) Move the mouse to the specified location, Options.Steps represents the movement of the movement
   mouse.up ([options]) triggering the mouseUp event 

7 Several useful API
PuPpeteer also provides several very useful APIs, such as:

7.1 Page.Waitfor Series API
  1. Page.Waitfor (SelectororFunctionOutorm (SelectoRfunctionOutormout [, Options [, … Args]]) The following three comprehensive API
  2. page.waitForfunction (PageFunction “) Waiting for PageFunction to perform completion
  3. Page.WaitFornavigation (options) Waiting for page basic elements, such as synchronous HTML, CSS, JS, etc. Code
  4. page.waitforselector (selector [, options] ) After waiting for the element of a selector, this element can be asynchronously loaded, this API is very useful, you know.
For example, I want to get an element that is asynchronous by JS, then directly acquisition is definitely not obtained. This time you can use Page.WaitForselector to solve:
  
AWAIT Page.Waitforselector (‘. Gl-item’); // After waiting for the element loading, you can get an elements of asynchronous load const links = Await Page. $$ Eval (‘. Gl-item> .GL-I-WRAP> .P-IMG> A ‘, LINKS => {Return Links.map (a => {return {href: a.href.trim (), name: a.title}};

In fact, the above code can solve our top needs, capture Jingdong’s products, because it is asynchronously, so use this way.
7.2 Page.getMetrics ()
   You can obtain some page performance data to capture the timeline tracking of the website to help diagnose performance issues. 
Timestamp Metrics Sampling Time Stamp

Documents page Document

Frames page frame
jseventListeners page Event listener number

NODES page DOM node number

    Total number of layout layout
  1. RecalCStyleCount style recalculation
  2. Layout During the merge duration of all page layout
  3. RecalCstyleDuration All page styles recalculate the combination duration.
  4. Scriptduration duration of all script execution
  5. TaskDuration All browser tasks
jsheapusedsize javascript occupied heap size
 jsheaptotalsize javascript heap total quantity   
8, summary and source

This article learns some of the basic common APIs of PuPpeteer through a practical demand, and the version of the API is V0.13.0-alpha. The latest state of the API, please refer to PuPpeteer Official API.

For the purposes, PuPpeteer is really a good Headless tool, easy to operate, powerful. Used to do UI automation tests, and some gadgets are very nice.

The demand we started to implement the source code, for reference only:
  1. // delayed function function sleep (delay) {Return New Promise (resolve, reject) => {setTimeout (() => {Try {resolve (1)} catch (e) {reject (0)}}, delay)})}} const puPpeteer = Require (‘PuPpeteer’); PuPpeteer . .launch ({ignoreHTTPSErrors: true, headless: false, slowMo: 250, timeout: 0}) then (async browser => {let page = await browser.newPage (); await page.setJavaScriptEnabled (true); await page. goto (“HTTPS: //www.jd.com/ “); const searchinput = await page. $ (” # key “); await searchinput.focus (); // positioning to search box aviaction page.keyboard.type (” mobile phone ” ); const searchbtn = await page. $ (“. button”); await searchbtn.click (); await page.waitforselector (‘. gl-item’); // After the element is loaded, it will get the elements that are not asynchronously loaded. Const links = await page. $$ Eval (‘. gl-item> .gl-i-wrap> .p-img> a’, links => {return links.map (a => {return {href: a. href.trim (), title: a.title}};}); connection (); const atags = links.splice (0, 10); for (var i = 1; i
  2. {let ScrollTop = document.scrollingElement.scrollTop; document.scrollingElement.scrollTop = scrollTop + scrollStep; return document.body.clientHeight> scrollTop + 1080 true: false}, scrollStep); await sleep (100);} await page.waitForSelector ( “#? FOOTER-2014 “, {TIMEOUT: 0}); //Judging whether to reach the bottom of Let FileName = “Images / Items -” + “. PNG”; // This is a PuPpeteer BUG has not been resolved. It is found that the highlight of the screenshot can only be 16384px, and the exceeding part is cut off. Await page.screenshot ({path: filename); page.close ();} Browser.close ();};


The above is this article All of the content, I hope to help everyone’s learning, I hope everyone will support Tumi Clouds.

© Copyright Notice
THE END
Just support it if you like
like0
share
comment Grab the couch

Please log in to comment