Use rClone to Copy Your Data to Rumble Cloud Object Storage¶
Using rClone can be an easy way to get your data into Rumble Cloud Object Storage bucket(s) if:
- your data is elsewhere and you want to migrate to Rumble Cloud
- you have an additional copy of your data on Rumble Cloud Object Storage for cross-cloud data resiliance or backup
What is rClone¶
rClone can be thought of as a cloud native version of the classic rsync tool, It is a command-line driven open source tool intended to sync or migrate files between cloud providers or local storage. rClone is the "swiss-army-knife" of cloud storage.
While we will provide examples of how to do this between s3 object storage providers, there are many other provider object storage protocols and also local storage that is supported by rClone.
Running rClone¶
Host System¶
rClone runs on Windows, Mac OS-X, and Linux but if you are copying or moving your data to Rumble Cloud Object Storage we recommend running rClone from a VM hosted on Rumble Cloud. RAM optimized instances are ideal for this job as RAM optimized instances provide an ideal match for rClones needs in terms of RAM and network bandwidth.
Installing rClone¶
rClone is available in most linux distros package repositories, to install rClone on a Ubuntu based VM a quick apt install -y rclone will get it done.
Configuring rClone¶
rClone includes an interactive method to create and manage it's configuration file with the command rclone config but if you have a lot of buckets or many different object storage providors it is often quicker to edit the configuration file directly. On a linux system the default rClone configuration file can be found in ~/.config/rclone/rclone.conf. Below is an example rClone configuration for two s3 providers configuration.
rclone.conf¶
[rumble_cloud]
type = s3
provider = Ceph
access_key_id = ${Your_Access_Key}
secret_access_key = ${Your_Secret_Access_Key}
endpoint = object.us-east-1.rumble.cloud
acl = private
[bookstore]
type = s3
provider = AWS
access_key_id = ${Your_Access_Key}
secret_access_key = ${Your_Secret_Access_Key}
region = us-east-1
acl = private
Syncing your data¶
The following instructions assume that you have already created your destination bucket.
migration.sh¶
#!/bin/bash
BUCKET=${MyBucketName}
SUBPATH=""
TRANSFERS=8
rclone sync -v --s3-no-check-bucket --stats=30s --use-mmap --fast-list --update --ignore-errors --low-level-retries 1 --delete-during --transfers=${TRANSFERS} bookstore:/${BUCKET}/${SUBPATH} rumble_cloud:/${BUCKET}/${SUBPATH} --log-file ${BUCKET}_transfer.log`
Paramter explanation¶
- --s3-no-check-bucket: prevents rClone from checking that the destination bucket exists, saving an API hit, the time it takes to complete and avoiding permission issues if the access key you are using does not have permissions to list or create buckets.
- --stats: gives a stats summary every 30 seconds allowing you to gauge how far along the transfer is.
- --use-mmap: optimizes memory usage by using memory mapped I/O for memory buffers
- --fast-list: makes rClone do recursive listings of bucket paths saving you many API hits and time in a deeply nested bucket at the expense of memory requirements. While this can massively speed up transfers from buckets that have lots of objects in a directory tree like structure it can also induce larger memory requirements and will make rClone take longer to start transferring files as the larger recursive listing takes longer to retrieve. Overall this improves the time it takes to sync a bucket. If you are using a VM with a smaller amount of RAM you may want to turn this off as storing a recursive listing from a very large bucket can result in out of memory issues.
- --update: makes rClone skip files that are newer on the destination, if you are repeatedly running rClone to keep your buckets in sync you want this to prevent re-transferring a file that has already been copied.
- --ignore-errors: makes rClone keep going even if it encounters an error such as a file being deleted after the initial listing is complete but before the file has been transfered.
- --low-level-retries: limits the number of times rClone with retry an API call before giving up, you may want to increase this if you are moving from your current provider to Rumble Cloud Object Storage because your current provider has a high api call failure rate.
- --delete-durring: will make rClone delete files from your destination bucket that do not exist in the source bucket. If you are migrating to a Rumble Cloud Object Storage bucket and have started uploading files to your rumble bucket that do not exist in the source bucket you want
--max-delete 0instead - --transfers: the gas pedal, rClone will transfer this many files at once. If your buckets contain a large number of small files (such as picture or text files), increasing this can get the job done faster but will consume additional RAM, for buckets with a smaller number of larger files (video files) you will get minimal or negative improvements. rClone's default is 4, a higher value will keep the network pipe more fully utilized.
- bookstore:/${BUCKET}/${SUBPATH}: the source provider, bucket name and optional subpath
- rumble_cloud:/${BUCKET}/${SUBPATH}: your destination provider, bucket name and optional subpath
- --log-file: specifies the name and path of the log file for your transfer, you will want to check this for any issues especially if you have --ignore-errors turned on
Tips¶
- Running more than one copy of your rClone migration wth different subpaths can allow you to effectively tune your transfers number to the type of data in that subpath.
- If you have pictures in one path inside your bucket and videos in another you may want to run two instances of rClone each tuned for it's workload.
- To improve efficiency when running multiple instances of your rClone migration make sure they do not overlap the same path in your bucket to avoid duplicated work.
- rClone works on each path in sequence not moving on until each path is done, which means that paths that have a mixture of a few large and many small files large numbers of transfers can lead to bottlenecks when only 1 or 2 large files are transferring after the smaller files complete. Running multiple instances of rClone can mitigate this.
- VM Instances with more RAM and network bandwidth (like Rumble Cloud's r2a.8xlarge instances) can make for a large improvement in the time it takes to complete your data copy.
- Consider using rClones bisync mode if you are actively wriing or deleting files in both buckets.
- Consider using an upstream CDN provider such as CloudFlare or CloudFront to allow you to switch to writing to your new bucket while still reading from your old bucket for files that have not been transferred yet.
- use
--dry-runto make sure rclone will do what you want it to prior to running it on live data.