Submitting Data via S3

Factual offers an easy pathway to integration via AWS S3. This document outlines the data format that Factual expects, and the process for submitting this data.

Transfer

Bucket & Permissioning

Factual’s integration requires hosting a bucket on AWS S3 and uploading the data to that bucket. This bucket will need permissions allowing Factual access to read the data. The following is an example bucket policy that can be applied:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowFactualFileOperations",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::315898705177:root"
      },
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::$BUCKET/*"
    },
    {
      "Sid": "AllowFactualListOperations",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::315898705177:root"
      },
      "Action": "s3:*",
      "Resource": "arn:aws:s3:::$BUCKET"
    }
  ]
}

File Format

Data should adhere to one of the formats described in our Accepted Formats document.

Data must be submitted as gzipped text, and input file names must end in “.gz”.

Schedule and File Path

Data may be submitted either hourly or daily. If hourly, the files should be uploaded to the following path:

s3://$BUCKET/$YYYY/$MM/$DD/$HH/$FILE_1
s3://$BUCKET/$YYYY/$MM/$DD/$HH/$FILE_2

If daily, the files should be uploaded to the following path:

s3://$BUCKET/$YYYY/$MM/$DD/$FILE_1
s3://$BUCKET/$YYYY/$MM/$DD/$FILE_2

In either case, it is necessary to write an empty file named _SUCCESS next to the data files, to indicate that all files have been uploaded to that path. The following is an example expected S3 directory structure:

s3://example-intake-bucket/2015/10/07/19/part-00000.gz
s3://example-intake-bucket/2015/10/07/19/part-00001.gz
s3://example-intake-bucket/2015/10/07/19/part-00002.gz
s3://example-intake-bucket/2015/10/07/19/part-00003.gz
s3://example-intake-bucket/2015/10/07/19/part-00004.gz
s3://example-intake-bucket/2015/10/07/19/_SUCCESS
s3://example-intake-bucket/2015/10/07/20/part-00000.gz
s3://example-intake-bucket/2015/10/07/20/part-00001.gz
s3://example-intake-bucket/2015/10/07/20/part-00002.gz
s3://example-intake-bucket/2015/10/07/20/part-00003.gz
s3://example-intake-bucket/2015/10/07/20/part-00004.gz
s3://example-intake-bucket/2015/10/07/20/_SUCCESS

Please note that individual data files should not exceed 1 GB in size.