Web Scraper Cloud can notify your server when a scraping job has finished. Configure a URL on your server which will receive notifications from Web Scraper Cloud when a scraping job finishes. Web Scraper will execute a POST FORM submit with scraping job metadata. To configure and test the notification endpoint visit Web Scraper Cloud API page.
Web Scraper will send the notification only once the job has been finished, stopped or failed. After receiving the notification you can start or queue data import.
A webhook notification could be retried:
A fresh webhook notification for the same scraping job can be sent:
Notification FORM data content example:
"scrapingjob_id": 1234
"status": "finished"
"sitemap_id": 12
"sitemap_name": "my-sitemap"
"custom_id": "custom-scraping-job-12"
When your server receives the notification, it has to respond with 2xx HTTP status code within 10 seconds. In case of an error code or a timeout the notification will be resent after a 5 second delay for the first retry and 10 second delay for the second retry.
We recommend using a queue system for deferred data import to improve data import handling. Here is a good example of a queue system - https://laravel.com/docs/10.x/queues. In case data import is being handled on the fly (in the webhook handler), send a success response immediately after receiving the request otherwise notification sender could timeout and resend the notification which could generate unexpected results.
<?php
use WebScraper\ApiClient\Client;
use WebScraper\ApiClient\Reader\JsonReader;
// NOTE! validate that request came from web scraper by sending a secret token
// in URL
$scrapingJobId = (int) $_POST['scrapingjob_id'];
$status = $_POST['status'];
$sitemapId = (int) $_POST['sitemap_id'];
$sitemapName = $_POST['sitemap_name'];
$customId = $_POST['custom_id'];
// Send web scraper a successful response that the notification is received and
// continue working on it. This speeds up notification delivery on web scraper
// part and the script won't be stopped after web scraper closes connection.
// More information here:
// http://stackoverflow.com/questions/15273570/continue-processing-php-after-sending-http-response
ignore_user_abort(true);
header('Connection: close');
header('Content-Length: '.ob_get_length());
ob_end_flush();
ob_flush();
flush();
// NOTE! Data import in this example is executed in the same request but it
// would be much smarter to do this in a queued job. Queued jobs can be
// restarted and rerun in case something fails.
$client = new Client([
'token' => 'YOUR API TOKEN',
]);
// download JSON file locally
$outputFile = "/tmp/scrapingjob-data{$scrapingJobId}.json";
try {
$client->downloadScrapingJobJSON($scrapingJobId, $outputFile);
// read data from file with built in JSON reader
$reader = new JsonReader($outputFile);
$rows = $reader->fetchRows();
foreach($rows as $row) {
// Import records into database. Importing records in bulk will speed up
// the process.
}
} finally {
// make sure output file is always deleted
unlink($outputFile);
}
// delete scraping job because you probably don't need it
$client->deleteScrapingJob($scrapingJobId);
<?php
require "../vendor/autoload.php";
use WebScraper\ApiClient\Client;
use WebScraper\ApiClient\Reader\JsonReader;
$apiToken = "API token here";
$scrapingJobId = 500; // scraping job id here
// initialize API client
$client = new Client([
'token' => $apiToken,
]);
// download file locally
$outputFile = "/tmp/scrapingjob-data{$scrapingJobId}.json";
try {
$client->downloadScrapingJobJSON($scrapingJobId, $outputFile);
// read data from file with built in JSON reader
$reader = new JsonReader($outputFile);
$rows = $reader->fetchRows();
foreach($rows as $row) {
// Import records into database. Importing records in bulk will speed up
// the process.
}
} finally {
// make sure output file is always deleted
unlink($outputFile);
}
// delete scraping job because you probably don't need it
$client->deleteScrapingJob($scrapingJobId);