First Impression: Mountpoint - Mounting an Amazon S3 as a local file system

Introduction

Ever since AWS announced the alpha release for Mounpoint back on Mar 14, 2023, I have been eagerly waiting for the integration to be generally available. The idea of seamlessly mounting S3 buckets as if they were local drives holds immense potential for transforming the way we handle data storage and access. And now, with the feature finally becoming generally available, the moment I've been waiting for has arrived. In this article, I'll be sharing my initial thoughts and experience as I dive into testing the Mountpoint tool for Amazon S3.

Mountpoint, an open-source project for Amazon S3 is a testament to the dynamic evolution of cloud storage solutions that we have been witnessing in the past few years. Conceived to address the growing demand for simplified access to data storage, the AWSLABS developer community has done a tremendous job in making sure that the tool is easy to use and is an enterprise-ready client that supports performant access to S3 at scale.

What is Tool Fit for:

An Amazon S3 mounted EC2 instance allows the use of native commands, shell commands & library functions like 'ls', 'cat', 'cp', 'touch', 'grep', 'open' etc. to list, read and interact with files.
It can be a great tool for data lake architectures and use cases, enabling seamless access to read large objects stored in S3 to multiple instances concurrently without the need of downloading them to local storage first.
It could be a great tool to simplify uploading and downloading files from S3, sharing and transferring files across local and cloud storage while taking advantage of S3 scale and durability.
It could facilitate collaborative workflows by providing a unified platform for storing, accessing and interacting with shared objects directly from local environments.

What the Tool is Not:

While Mountpoint offers seamless access to S3 files, it's not optimized for real-time collaborative editing scenarios. It only supports writing only to new files, and writes to new files must be made sequentially.
While Mountpoint provides a local storage feel, it's not a replacement for traditional local storage solutions. It compliments local storage by harnessing the benefits of cloud resources.
Mounpoint does not implement all POSIX file system features. It doesn't support advanced file operations such as locking, file permissions and ownership.

Installing the tool

Installing the Mounpoint for AWS S3 was simple and straightforward. On my Amazon Linux EC2 instance, I made use of the RPM package and installed it using the 'yum' command.

$ wget https://s3.amazonaws.com/mountpoint-s3-release/latest/x86_64/mount-s3.rpm
$ sudo yum install ./mount-s3.rpm

While the Mountpint client is designed to automatically pick up the credentials from an IAM role associated with the instance, I am using the AWS credentials from environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY since I am testing on an AWS sandbox playground.

$ mkdir mountp-test-fs
$ mount-s3 mountp-test-bucket mountp-test-fs
bucket mountp-test-bucket is mounted at mountp-test-fs

Changing the directory to the newly mounted folder and creating new files:

$ cd mountp-test-fs
$ mkdir test-folder
$ ls -l
total 0
drwxr-xr-x. 2 ec2-user ec2-user 0 Aug 11 17:50 test-folder
# Create new file
$ echo "Hello Mounted S3 Bucket" > TestFile_Write.txt
$ ls
TestFile_Write.txt
#View the file
$ cat TestFile_Write.txt
Hello Mounted S3 Bucket
#Find the line number for the word 'S3' in the file using grep 
$ grep -n 'S3' TestFile_Write.txt | wc -l
1
#Find the line number for the word 'S3' in the file using sed
$ sed -n '/S3/=' TestFile_Write.txt 
1

S3 bucket

As you can it's pretty simple to integrate with S3 and use the native shell/bash commands to interact with the objects within the bucket. However, as mentioned before one of the limitations is that we cannot edit or update an existing file using mountpoint (Maybe a feature that could be enabled in the future).

#Error during updates to the file objects 
$ echo "Hello, I am trying to update the file with new content." > TestFile_Write.txt
-bash: TestFile_Write.txt: Operation not permitted

Logging

As an Infrastructure Architect, I am constantly committed to maintaining operations and analyzing errors. So after simulating various error scenarios, such as directory deletions and removing files from the S3 bucket, I was able to view the logs via syslog. I found that Mountpoint only logs high-severity events and verbose logging can be enabled with the --debug option.

#Sample of error logs from syslog during my testing 
$ journalctl -e SYSLOG_IDENTIFIER=mount-s3
Aug 11 19:53:09 mount-s3[58999]: [WARN] lookup{req=1408 ino=1 name="test-folder"}: mountpoint_s3::fuse: lookup failed: inode error: file does not exist
Aug 11 19:53:09 mount-s3[58999]: [WARN] lookup{req=1410 ino=1 name="test-folder"}: mountpoint_s3::fuse: lookup failed: inode error: file does not exist
Aug 11 19:53:46 mount-s3[58999]: [WARN] lookup{req=1454 ino=1 name="test-folder"}: mountpoint_s3::fuse: lookup failed: inode error: file does not exist
Aug 11 19:53:51 mount-s3[58999]: [WARN] lookup{req=1460 ino=1 name="TestFile_Write.txt"}: mountpoint_s3::fuse: lookup failed: inode error: file does not exist

Summary

Mountpoint integration simplifies data collaboration, provides a local storage feel, and is easy to set up and use. It will prove to be an amazing tool to seamlessly transfer and manage large datasets from Amazon S3 to a local environment for data analysis, without the need for time-consuming downloads or compromising on storage capacity. However, it has to be noted that it does not facilitate real-time editing capabilities and offline access to the data is not supported since it requires an active internet connection to access S3 resources. I also wished that it can be integrated with CloudWatch natively so that we can easily monitor and analyze traffic and errors. Overall, it's a valuable asset for cloud workflows, bridging the gap between cloud and local storage with convenience, flexibility and scalability.

Ajith Joseph's Cloud Blog