Bulk CSV Upload via AWS S3

For high volume users, we offer a bulk CSV upload option utilising AWS S3.

Note: bulk upload is only available on the Delayed queue and is a whitelisted feature (you will need to request access first).

Pre-requisites

First, you will need to create a new IAM user under your own AWS account. Once created, please ensure you attach a new policy, AmazonS3FullAccess, to enable the user to interact with S3 buckets.

We strongly recommend you create a unique user with no other permissions for use solely with this feature.

Make a note of the User ARN, it will look something like the following:

arn:aws:iam::[your-AWS-id]:user/user_name

Open a support ticket with your User ARN and you will be notified when access has been granted. The upload bucket to use in your requests is delayed-bulk.

Note: the upload bucket is a RequesterPays bucket, and therefore you must include the x-amz-request-payer header parameter with your requests.

File Naming Structure

File upload must adhere to a strict format in order to be processed (note the use of the "in/" folder):

in/[your-API-key]_YYYYMMDD_[your-unique-identifier].csv

We recommend you use something unpredictable as your unique identifier for each upload. YYYYMMDD must correspond to today's date or the file will go unprocessed.

CSV Input Format

The upload format should be in the form of comma-separated values: keyword, engine_code, location, device. Keyword and engine_code are the only required fields. For example:

"first query",google_en-us,,
"pizza delivery",google_en-us,90210,mobile
Note: keywords should be UTF-8 encoded and each CSV upload can contain a maximum of 100,000 rows.

CSV Output Format

Once processed, a corresponding CSV file will be created in the "out/" folder of the same bucket:

out/[your-API-key]_YYYYMMDD_[your-unique-identifier].csv

It will contain the following format:

keyword, engine_code, location, device, status, delayed_id, modified_keyword

modified_keyword will be empty unless it has been modified before the SERP was fetched.

status will contain either ok for success or error for a failure.

delayed_id will contain a unique delayed queue ID on success, or an error message on failure.

Example output

"cheap   holidays",google_en-us,,,ok,50185500ebe44205ca000004,"cheap holidays"
"flights to nyc",,,,error,"Required field 'engine_code' missing",

In this example, in row 1 the superfluous whitespace was removed before the query was processed. Row 2 contains a failure due to no engine_code being present.

Note: due to the way we split and parallelize processing of uploads, the output will not necessarily be in the same row order as the upload.

Comments