How do I mass delete current and non-current objects inside a bucket?

There are some s3 applications/tools which recognize the versioning feature and allow you to delete one versioned object at a time as shown in the How do I delete old object versions document. 

This is helpful when you have fewer objects to delete. We can also use a scripted method to delete large object sets (current + non-current old versioned + delete markers) using the following python code

  • Make sure you have installed AWS SDK boto3 for python on your CLI and turned off the versioning feature on your bucket before running the script
  • Install Python 3+ version to run this script

Executions and Details of the Script (output & screenshot attached):

1. When you execute the script, it will prompt you to select the profile or enter the API keys of the admin who is executing this script

(i) If you already have a profile configured on your CLI, you may Press 1

you may configure the AWS CLI profile for the Wasabi account using the Wasabi keys ahead of time

NOTE that it is optional for you to use credential files to run your code but it is always a best practice to use such implementation wherein your credential keys are in a file stored on your local machine rather than being part of your actual code or entering Keys at runtime prompt

 

(ii) If you do not wish to use the existing profile, you may press 2 and enter your API Keys.

 

2. Enter your own Bucket Name and Prefix (if you wish) and enter the appropriate bucket region URL

Note that this example discusses the use of Wasabi's us-east-1 storage region. To use other Wasabi storage regions, please use the appropriate Wasabi service URL as described in this article

NOTE: If you are specifying a prefix, please be sure to enter FULL PREFIX PATH (bucket name NOT included)

3. Before the deletion starts, the script will perform complete pagination of your bucket and show you 

  • Total Number of Delete-Markers which will be deleted (if present)
  • Total Number of Current Objects which will be deleted (if present)
  • Total Number of Non-Current Objects which will be deleted from your bucket (if present)

4. After presenting the bucket statistics above, the script will ask if you need to enable Governance Mode override. This option is for Object Locked immutable buckets with Governance Mode enabled, not Compliance Mode. When a bucket is locked with Governance Mode, you may remove objects only if your IAM entity has the 's3:BypassGovernanceRetention' permission and you include the BypassGovernanceRetention header in the request. If you require this header and your IAM entity has the proper permissions, please enable this option with the 'y' input. Otherwise, input 'n' to bypass.

 

Each log entry will represent the deletion of 1000 objects at a time

NOTE: After executing the script and successfully completing the deletion process, you may run the same script again to confirm your total number statistics 

  • Total Number of Delete-Markers 
  • Total Number of Current Objects
  • Total Number of Non-Current Objects

 

# Copyright (c) 2022. This script is available as fair use for users. This script can be used freely with Wasabi
# Technologies, LLC. Distributed by the support team at Wasabi Technologies, LLC.

"""
Overview
This Script will take the following inputs:
1. Profile name / Access key and Secret Key
2. Bucket name
3. Prefix
4. Region
Calculate the size and count of the total number of delete markers, current and non current objects. Will ask for a
prompt to delete the delete markers, current and non-current objects.
"""
import sys
from boto3 import client, Session
from botocore.exceptions import ProfileNotFound, ClientError


def calculate_size(size, _size_table):
"""
This function dynamically calculates the right base unit symbol for size of the object.
:param size: size in integer to be dynamically calculated.
:param _size_table: dictionary of size in Bytes
:return: string of converted size.
"""
count = 0
while size // 1024 > 0:
size = size / 1024
count += 1
return str(round(size, 2)) + ' ' + _size_table[count]


def get_credentials():
"""
This function gets the access key and secret key by 2 methods.
1. Select profile from aws credentials file.
Make sure that you have run AWS config and set up your keys in the ~/.aws/credentials file.
2. Insert the keys directly as a string.
:return: access key and secret key
"""
credentials_verified = False
aws_access_key_id = None
aws_secret_access_key = None
while not credentials_verified:
ch = input("$ Press 1 and enter to select existing profile\n"
"$ Press 2 and enter to enter Access Key and Secret Key\n"
"$ Press 3 to exit: ")
if ch.strip() == "1":
aws_access_key_id, aws_secret_access_key = select_profile()
if aws_access_key_id is not None and aws_secret_access_key is not None:
credentials_verified = True
elif ch.strip() == "2":
aws_access_key_id = input("$ AWS access key").strip()
aws_secret_access_key = input("$ AWS secret access key").strip()
credentials_verified = True
elif ch.strip() == "3":
sys.exit(0)
else:
print("Invalid choice please try again")
return aws_access_key_id, aws_secret_access_key


def select_profile():
"""
sub-function under get credentials that selects the profile form ~/.aws/credentials file.
:return: access key and secret key
"""
profile_selected = False
while not profile_selected:
try:
profiles = Session().available_profiles
if len(profiles) == 0:
return None, None
print("$ Available Profiles: ", profiles)
except Exception as e:
print(e)
return None, None
profile_name = input("$ Profile name: ").strip().lower()
try:
session = Session(profile_name=profile_name)
credentials = session.get_credentials()
aws_access_key_id = credentials.access_key
aws_secret_access_key = credentials.secret_key
profile_selected = True
return aws_access_key_id, aws_secret_access_key
except ProfileNotFound:
print("$ Invalid profile. Please Try again.")
except Exception as e:
raise e


def region_selection():
"""
This function presents a simple region selection input. Pressing 1-5 selects the corresponding region.
:return: region
"""
region_selected = False
_region = ""
while not region_selected:
_choice = input("$ Please enter the endpoint for the bucket: ").strip().lower()
if len(_choice) > 0:
_region = _choice
region_selected = True
return _region


def create_connection_and_test(aws_access_key_id: str, aws_secret_access_key: str, _region, _bucket):
"""
Creates a connection to wasabi endpoint based on selected region and checks if the access keys are valid.
NOTE: creating the connection is not enough to test. We need to make a method call to check for its working status.
:param aws_access_key_id: access key string
:param aws_secret_access_key: secret key string
:param _region: region string
:param _bucket: bucket name string
:return: reference to the connection client
"""
try:
_s3_client = client('s3',
endpoint_url=_region,
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key)

# Test credentials are working
_s3_client.list_buckets()

try:
_s3_client.head_bucket(Bucket=bucket)
except ClientError:
# The bucket does not exist or you have no access.
raise Exception("$ bucket does not exist in the account please re-check the name and try again: ")

return _s3_client

except ClientError:
print("Invalid Access and Secret keys")
except Exception as e:
raise e
# cannot reach here
return None


if __name__ == '__main__':
# Generate a table for SI units symbol table.
size_table = {0: 'Bs', 1: 'KBs', 2: 'MBs', 3: 'GBs', 4: 'TBs', 5: 'PBs', 6: 'EBs'}

print("\n")
print("\n")
print("$ starting script...")

# generate access keys
access_key_id, secret_access_key = get_credentials()

# get bucket name
bucket = input("$ Please enter the name of the bucket: ").strip()

# prefix
prefix = input("$ Please enter a prefix (leave blank if you don't need one)").strip()

# get region
region = region_selection()

# test the connection and access keys. Also checks if the bucket is valid.
s3_client = create_connection_and_test(access_key_id, secret_access_key, region, bucket)

# create a paginator with default settings.
object_response_paginator = s3_client.get_paginator('list_object_versions')
if len(prefix) > 0:
operation_parameters = {'Bucket': bucket,
'Prefix': prefix}
else:
operation_parameters = {'Bucket': bucket}

# initialize basic variables for in memory storage.
delete_marker_count = 0
delete_marker_size = 0
versioned_object_count = 0
versioned_object_size = 0
current_object_count = 0
current_object_size = 0
delete_list = []

print("$ Calculating, please wait... this may take a while")
for object_response_itr in object_response_paginator.paginate(**operation_parameters):
if 'DeleteMarkers' in object_response_itr:
for delete_marker in object_response_itr['DeleteMarkers']:
delete_list.append({'Key': delete_marker['Key'], 'VersionId': delete_marker['VersionId']})
delete_marker_count += 1

if 'Versions' in object_response_itr:
for version in object_response_itr['Versions']:
# add any key to delete list
delete_list.append({'Key': version['Key'], 'VersionId': version['VersionId']})
if version['IsLatest'] is False:
versioned_object_count += 1
versioned_object_size += version['Size']

elif version['IsLatest'] is True:
current_object_count += 1
current_object_size += version['Size']

print("\n")
print("-" * 10)
print("$ Total Delete markers: " + str(delete_marker_count))
print("$ Number of Current objects: " + str(current_object_count))
print("$ Current Objects size: ", calculate_size(current_object_size, size_table))
print("$ Number of Non-current objects: " + str(versioned_object_count))
print("$ Non-current Objects size: ", calculate_size(versioned_object_size, size_table))
print("$ Total size of current + non current objects: ",
calculate_size(versioned_object_size + current_object_size, size_table))
print("-" * 10)
print("\n")

delete_flag = False
while not delete_flag:
choice = input("$ Do you wish to delete the delete markers, current, and non-current objects? [y/n] ")
lock = input("$ Do you need to add Governance Mode override (requires root/admin or s3:BypassGovernanceRetention privileges)? [y/n] ")
if choice.strip().lower() == 'y' and lock.strip().lower() == 'y':
delete_flag = True
print("$ starting deletes now...")
print("$ removing delete markers, current and non current 1000 at a time")
for i in range(0, len(delete_list), 1000):
response = s3_client.delete_objects(
Bucket=bucket,
BypassGovernanceRetention=True,
Delete={
'Objects': delete_list[i:i + 1000],
'Quiet': True
}
)
print(response)

elif choice == 'y' and lock == 'n':
delete_flag = True
print("$ starting deletes now...")
print("$ removing delete markers, current and non current 1000 at a time")
for i in range(0, len(delete_list), 1000):
response = s3_client.delete_objects(
Bucket=bucket,
Delete={
'Objects': delete_list[i:i + 1000],
'Quiet': True
}
)
print(response)

elif choice == 'n' and lock == 'n':
delete_flag = True

else:
print("$ invalid choice please try again.")

print("$ process completed successfully")
print("\n")
print("\n")

 

This is what the output should look like:

script_output.png

The script is also attached to this KB document.

 

 

Have more questions? Submit a request