Web Scraper browser extension supports data export in CSV and XLSX formats while Web Scraper Cloud supports data export in CSV, XLSX and JSON formats. A JSON format will be added to the Web Scraper extension in a future release.
Download scraped data via Export data as CSV
menu selection under
the Sitemap
menu. Data can be also downloaded while the scraper is running.
Download scraped data via website from Jobs
or Sitemaps
sections. Data can
be also downloaded while the scraper is running.
Set up automated data export to Dropbox
, Google Sheets
or S3
via the Data Export
section. Currently exported data will be in CSV format. Data
will be exported to Apps/Web Scraper
in your Dropbox
, Google Drive/Web Scraper
in Google Sheets
and bucket/web-scraper
in
S3
.
Additionally, you can download data via Web Scraper Cloud API in CSV or JSON formats.
Data in separate cells is limited to 32767 characters. Additional characters will be cut off. Use other export formats if large text contents are expected in a single cell. The row count is limited to 1 million rows. In case the data set contains more than 1 million rows, the data will be split into multiple sub sheets.
JSON file format contains one JSON record per line. Newline characters found in
data will be escaped as "\n"
so \n
character can be safely used as a record
separator.
Note! Parsing the entire file as a JSON string will not work since all records are not wrapped in a JSON array. This was a design decision to make it easier to parse large files.
Comma Separated Values
files format is formatted as described in RFC 4180
standard. Values are quoted in double-quotes "
and in a case when a double-quote
character is in the text it is escaped with another double-quote character. Lines
are separated with CR+LF
\r\n
characters. Additionally CSV files include
byte order mark (BOM) U+FEFF
characters at the beginning of the file to hint
that the file will be in UTF-8 format. Newline characters are not escaped which
means using \r\n
as a record separator can result in errors. We recommend
using a CSV reader library when reading CSV files programmatically.
We recommend using Libre Office Calc when opening CSV files. Microsoft office often is incorrectly interpreting CSV files formatted in RFC 4180 standard. Mostly this is related to text including newline characters.
In case when a CSV file is incorrectly opened by Microsoft Excel try using data import feature:
Choose From Text/CSV
Set up import settings - UTF-8 encoding, Comma delimiter, Do not detect data types