Use Async, EnterProxy to control concurrent quantity methods

Talk to concurrency with parallel

In the operating system, it means that several programs in a period of time are started to run between operation. And these processes are running on the same processor, but only one program is running on the process on any time.

In concurrently mentioned, whether it is web server, app is unwitten, the operating system, refers to several programs in a time period, and there is already a running running, and these procedures They are running on the same process, and only one program is running on the process. Many websites have a limit of concurrent connections, so when the request is sent too fast, it will cause the return value empty or error. What’s more, some websites may be too much in the number of concurrent connections, when you are malicious requests, blocking your IP.

relative to concurrency, parallel may be unfamiliar, and the parallelism is executed in an independent asynchronous speed, not equal to the time overlap (at the same time), by increasing the CPU core to implement multiple The program (task) is also performed. That’s right, in parallel with multiple tasks,

Using EnterProxy to control the quantity

EnterProxy is a tool that is very large as the main contribution, bringing An event-programming thinking, using event mechanisms to decouple complex business logic, solve the disease of the callback function coupling, etc., the serial waiting becomes parallel, enhance the performance efficiency under the multi-asynchronous collaborative scenario

How do we use EnterProxy to control concurrent quantity? Usually we do not use EnterProxy and homemade counters, we can grab three sources:

This deep nesting, serial mode

  Var render = function (template, data) {_.template (Template, DATA);}; $. get ("template", functionTemplate) {// Something $ .GET ("DATA", Function (data) {// Something $ .GET ("L10n", Function (L10N) {// Something Render (Template, Data, L10n);}); });});  
Remove this method of deep nesting in the past, our regular ways of writing themselves maintain a counter

(Function () {var count = 0; Var Result = {}; $ .GET (‘Template’, Function (data) {result.data1 = data; count ++; handle ();}). Get (‘Data’, Function (Data) {Result.data2 = Data; Count ++; Handle ();}) $ .GET (‘L10n’, Function (DATA) {Result.Data3 = Data; Count ++; Handle (); }) Function handle () {if (count === 3) {var html = fuck (result.data1, result.data2, result.data3); render (html);}}} ();
  Here, EnterProxy can play the role of this counter, which helps you manage if these asynchronous operations are completed, after completion, he will automatically call the process you provide, and will catch Take the data when doing parameters 

VAREP = New Enterproxy (); EP.ALL (‘DATA_EVENT1’, ‘DATA_EVENT2’, ‘DATA_EVENT3’, FUNCTION (DATA1, DATA2, DATA3) {var HTML = fuck (Data1, Data2, Data3); render (html); ) $. get (‘http: eXample1’, function (data) {ep.emi (‘data_event1’, data);}) $. get (‘http: example2’, function (data) {ep.emit (‘data_event2 ‘, DATA);}) $. get (‘ http: example3 ‘, function (data) {ep.emit (‘ data_event3 ‘, data);})
   EnterProxy also provides the API needed for many other scenes, you can learn from this API Enterproxy 
Use async to control concurrency

If we have 40 The request needs to be issued, and many websites may have too many concurrent connections you have, when you are malicious requests, block your IP. So we always need to control concurrency, then slowly grab these 40 links.

Use the MapLimit in Async to control the disposable concurrent number of 5, and only 5 links are grabbed one-time.


Async.maplimit (Arr, 5, Function (URL, Callback) {// something} {Console.log (“Result:” Console.log (result);})

We should first know what is concurrent, why do you need to limit concurrency numbers? Then you can go to the document to see how the API is used. Async documents can learn these grammar well.
 Simulates a set of data, the data returned here is fake, and the returned delay is random.   
VAR concurreycount = 0; var fetchurl = function (url, callback) {// delay value within 2000, is a random integer analog delay var delay = PARSEINT ((Math.random () * 10000000)% 2000, 10); ConcurreyCount ++; console.log (‘Now is the number of “is’,” is crawling “, URL,’ time-consuming ‘+ delay +’ millisecond ‘); settimeout (function () {concurreycount -; callback (null, url +’ html content ‘);}, delay);} var URLS = []; for (var i = 0; i

Then we use Async.maplimit to get concurrent and get results.

  Async.maplimit (URLS, 5, Function) URL, Callback {fetchurl (URL, Callbcak);}, function (err, result) {console.log ('result:'); console.log(}) <30;i++){
 urls.push('http://datasource_' + i)
} 
Simulation is taken from AlSotang

After running output, the following result is obtained

  We It was found that the number of results increase from 1, but it was not increased. However, when there is a task, it will continue to capture, and the number of concurrent connections is always controlled in 5.  
Completed Node Simple Heritter System

Because the EventProxy control used in the “Node Package Unit” tutorial example of Alsotang Seniors, we To complete a simple reptile of Node using Async to control concurrency.

The goal of climb is the first page of this site (manual face)

First, we need to use the following modules: 使用async、enterproxy控制并发数量的方法详解

URL: use As the URL parsing, here URL.Resolve () generates a legitimate domain name

Async: a practical module, providing powerful features and asynchronous JavaScript work cheerio: Special for the server Customized, fast, flexible, implementation of JQuery core implementation

SuperAgent: Nodejs, a very convenient client request agency module

Install dependency module

  • Step 2, the dependent module is introduced through Require, determines the crawling object URL:
  • VAR URL = Require (” URL “); var async = Require (” async “); var cheerio = Require (” cheerio “); var superagent = request (” superagent “); VAR Baseurl = ‘http://www.chenqaq.com’;
  • Step 3: Use the SuperAgent request target URL, and use the cheerio to process BaseURL to get the target content URL, And saved in array arr

SuperAgent.get (BaseURL) .end (Function (ERR, RES) {IF (ERR) {Return Console.Error (ERR); } var arr = []; var $ = cheerio.load (res.Text); // The following and jQuery operations are the same .. $ (“post-list .post-title-link”). Each (Function (IDX, Element) {$ Element = $ (Element); var _url = url.resolve (baseurl, $ element.attr (“href”)); arr.push (_URL);}; // verify all Article link set Output (arr); // Step 4: Next traversing ARR, parsing the information required for each page})

使用async、enterproxy控制并发数量的方法详解

We need a function to capture URL object, very simple, we only need a function to traverse Arr and print it out:

function output (arr) {for (var i = 0; i
   Fourth: We need to traverse the URL object, parse the information required for each page. 
Here you need to use the async to control concurrency, if You get a step by step.A huge Arr array, there are multiple URLs need to request, if you send multiple requests, some websites may put your behavior when doing malicious requests and seal your IP

Async.maplimit (Arr, 3, Function (URL, Callback) {SupeRagent.get (URL) .end (Function (Err, MES) {if (err) {console.error (ERR); console.log ‘Message INFO’ + JSON.STRINGIFY (MES));} console.log (‘”Fetch”‘ + URL + ‘Successful!’); var $ = cheerio.load (MES.TEXT); var jsondata = {Title: $ (‘. Post-card-title’). Text (). Trim (), HREF: URL,}; Callback (Null, JSONDATA);}, function (error, results) {console.log (‘result ” Console.log (results);})
  To obtain an array of the URL address in the previous step, limit the maximum concurrent number of 3, then use a callback function to process "This time is more special. In the IteRatee method, it must call the callback function, there are three ways"  
Callback (null)

call success

Callback (NULL, DATA)
 calls success, and return data DATA append to results <arr.length;i++){
  console.log(arr[i]);
 }
} Callback (data) 
Call failed, no longer loop, directly to the last Callback

Ok, here our Node simple small reptile is completed Let’s take a look at the effect

Hey, the home page is so small, but it is successful.
   
Node.js Package no package – Alsotang

Enterproxy
  • Async
  • The above is the full content of this article, I hope this paper has a certain reference value for everyone’s learning or work. If you have any questions, you can leave a message, thank you for your support of Tumi Cloud.
© Copyright Notice
THE END
Just support it if you like
like0
share
comment Grab the couch

Please log in to comment