How to download ALL versions of all files within a Bucket

There are some s3 applications/tools which recognize versioning feature and either allow you to download one versioned object at a time or download all current revision of all objects inside your bucket.

1. For example when you use AWS CLI

You can list all objects along with its version ID using CLI command:

aws s3api list-object-versions --bucket <bucket-name> --endpoint-url https://s3.us-east-1.wasabisys.com

Example Output:

aws s3api list-object-versions --bucket download-versions-bucket --endpoint-url https://s3.us-east-1.wasabisys.com
{
"VersionId": "001595000443450881443-klYCL1RDCV",
"IsLatest": true,
"ETag": "\"4bd22a5ff53e7cd0160e3f51917b9393\"",
"LastModified": "2020-07-17T15:40:43.000Z",
"Owner": {
"DisplayName": "sahani.p",
"ID": "EE5775E47FC856DD908EFBCE0E69EBD91CA400F4A86B6445F2F74A7A389C7840"
},
"StorageClass": "STANDARD",
"Size": 3855,
"Key": "Wasabi-read-write-1MB.xml"
},

...

And then use the get-object command while specifying the version ID to download that particular object:

aws s3api get-object --bucket <bucket-name> --key <name> --version-id <version-id> <outfile-name> --endpoint-url=https://s3.us-east-1.wasabisys.com

Example Output:

aws s3api get-object --bucket download-versions-bucket --key Wasabi-read-write-1MB.xml --version-id 001595000443450881443-klYCL1RDCV new-name.xml --endpoint-url=https://s3.us-east-1.wasabisys.com
{
"Metadata": {},
"VersionId": "001595000443450881443-klYCL1RDCV",
"ETag": "\"4bd22a5ff53e7cd0160e3f51917b9393\"",
"ContentType": "text/xml",
"ContentLength": 3855,
"AcceptRanges": "bytes",
"LastModified": "Fri, 17 Jul 2020 15:40:43 GMT"
}

2. When you use application like S3 Browser you can download all current versions of all objects as shown here in the screenshots. 

Select the bucket --> right click on Versions menu --> Download

Screen_Shot_2020-07-17_at_3.26.56_PM.png

Screen_Shot_2020-07-17_at_3.27.03_PM.png

 

If your requirement is to Download ALL versions (current + old versions) of ALL objects inside your bucket, you may use the scripted approach

The following script is tested with Wasabi to achieve this use case:

  • Make sure you have installed AWS SDK boto3 and click package for python on your CLI before running the script
  • Note that this code example discusses the use of Wasabi's us-east-1 storage region. To use other Wasabi storage regions, please use the appropriate Wasabi service URL as described here
#!/usr/bin/env python3

import boto3
import click
import re
import shutil
import sys

@click.command(help = 'List S3 versions, optionally download all versions as well')
@click.option('--bucket', required = True, help = 'The s3 bucket to scan')
@click.option('--prefix', default = '', help = 'Prefix of files to scan')
@click.option('--download', default = False, is_flag = True, help = 'Download all versions (prefix filenames with ISO datetime of edit and version), paths not preserved')
def s3versions(bucket, prefix, download):
'''List all versions of files in s3.'''

s3 = boto3.resource(
's3',
endpoint_url = 'https://s3.us-east-1.wasabisys.com',
aws_access_key_id='Wasabi-Access-Key',
aws_secret_access_key='Wasabi-Secret-Access-Key')
bucket = s3.Bucket(bucket)
versions = bucket.object_versions.filter(Prefix = prefix)

for version in versions:
object = version.get()

path = version.object_key
last_modified = object.get('LastModified')
version_id = object.get('VersionId')
print(path, last_modified, version_id, sep = '\t')

if download:
object = version.get()
filename = path.rsplit('/')[-1]
with open('{last_modified}-{version_id}-{filename}'.format(last_modified = last_modified, version_id = version_id, filename = filename), 'wb') as fout:
shutil.copyfileobj(object.get('Body'), fout)

if __name__ == '__main__':
s3versions()

 Execution syntax for the above program:

python s3versions.py --bucket <bucket-name> --prefix <prefix-name> --download 

Here are the outputs:

1. The bucket has multiple versions of different files inside a "download-versions-bucket" bucket, the below command is listing all of those along with its Version ID

syntax:

python s3versions.py --bucket <bucket-name>

Example output:

$ python s3versions.py --bucket download-versions-bucket

Wasabi Pings.rtf 2020-07-17 19:01:44+00:00 001595012503828293579-Iey7J7ecdX
Wasabi Pings.rtf 2020-07-17 15:40:43+00:00 001595000443240234132-qUmZ0PZcBs
Wasabi Pings.rtf 2020-07-17 15:39:58+00:00 null
Wasabi-read-write-1MB.xml 2020-07-17 15:40:43+00:00 001595000443450881443-klYCL1RDCV
Wasabi-read-write-1MB.xml 2020-07-17 15:39:59+00:00 null
Wasabi_Invoice_373103.pdf 2020-07-17 19:01:45+00:00 001595012504561887231-tN92rg7sq4
Wasabi_Invoice_373103.pdf 2020-07-17 15:40:44+00:00 001595000443683495545-WeujjiU8LK
Wasabi_Invoice_373103.pdf 2020-07-17 15:39:59+00:00 null
bucket-utilization-for-invoice.csv 2020-07-17 19:01:44+00:00 001595012504138434824-ieVYu1_rZj
bucket-utilization-for-invoice.csv 2020-07-17 15:39:58+00:00 null

 

2.  Listing based on prefixes:

From the entire files, you can choose to list files based on prefix matching, in this example we are listing all objects that has "Wasabi" as prefix and only listing those files

syntax:

python s3versions.py --bucket <bucket-name> --prefix <prefix-name>

Example output:

$ python s3versions.py --bucket download-versions-bucket --prefix Wasabi

Wasabi Pings.rtf 2020-07-17 19:01:44+00:00 001595012503828293579-Iey7J7ecdX
Wasabi Pings.rtf 2020-07-17 15:40:43+00:00 001595000443240234132-qUmZ0PZcBs
Wasabi Pings.rtf 2020-07-17 15:39:58+00:00 null
Wasabi-read-write-1MB.xml 2020-07-17 15:40:43+00:00 001595000443450881443-klYCL1RDCV
Wasabi-read-write-1MB.xml 2020-07-17 15:39:59+00:00 null
Wasabi_Invoice_373103.pdf 2020-07-17 19:01:45+00:00 001595012504561887231-tN92rg7sq4
Wasabi_Invoice_373103.pdf 2020-07-17 15:40:44+00:00 001595000443683495545-WeujjiU8LK
Wasabi_Invoice_373103.pdf 2020-07-17 15:39:59+00:00 null

 

3. Downloading All objects(Current + Old versions):

syntax:

python s3versions.py --bucket <bucket-name> --download

Example output:

$ python s3versions.py --bucket download-versions-bucket --download

Wasabi Pings.rtf 2020-07-17 19:01:44+00:00 001595012503828293579-Iey7J7ecdX
Wasabi Pings.rtf 2020-07-17 15:40:43+00:00 001595000443240234132-qUmZ0PZcBs
Wasabi Pings.rtf 2020-07-17 15:39:58+00:00 null
Wasabi-read-write-1MB.xml 2020-07-17 15:40:43+00:00 001595000443450881443-klYCL1RDCV
Wasabi-read-write-1MB.xml 2020-07-17 15:39:59+00:00 null
Wasabi_Invoice_373103.pdf 2020-07-17 19:01:45+00:00 001595012504561887231-tN92rg7sq4
Wasabi_Invoice_373103.pdf 2020-07-17 15:40:44+00:00 001595000443683495545-WeujjiU8LK
Wasabi_Invoice_373103.pdf 2020-07-17 15:39:59+00:00 null
bucket-utilization-for-invoice.csv 2020-07-17 19:01:44+00:00 001595012504138434824-ieVYu1_rZj
bucket-utilization-for-invoice.csv 2020-07-17 15:39:58+00:00 null

 

4. Downloading ALL objects (Current + Old versions) based on prefixes:

syntax:

python s3versions.py --bucket <bucket-name> --prefix <prefix-name> --download

Example output:

$ python s3versions.py --bucket download-versions-bucket --prefix Wasabi --download

Wasabi Pings.rtf 2020-07-17 19:01:44+00:00 001595012503828293579-Iey7J7ecdX
Wasabi Pings.rtf 2020-07-17 15:40:43+00:00 001595000443240234132-qUmZ0PZcBs
Wasabi Pings.rtf 2020-07-17 15:39:58+00:00 null
Wasabi-read-write-1MB.xml 2020-07-17 15:40:43+00:00 001595000443450881443-klYCL1RDCV
Wasabi-read-write-1MB.xml 2020-07-17 15:39:59+00:00 null
Wasabi_Invoice_373103.pdf 2020-07-17 19:01:45+00:00 001595012504561887231-tN92rg7sq4
Wasabi_Invoice_373103.pdf 2020-07-17 15:40:44+00:00 001595000443683495545-WeujjiU8LK
Wasabi_Invoice_373103.pdf 2020-07-17 15:39:59+00:00 null

 

 

Have more questions? Submit a request