Skip to content

trouchet/cloudfs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

36 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CloudFS - DuckDB Cloud Storage Extension

DuckDB C++17 License

A DuckDB extension that registers cloud storage protocols as first-class filesystems with native support for Parquet, CSV, Delta Lake, and Iceberg.

πŸš€ Features

  • Multi-Provider Support: OneDrive, SharePoint, Google Drive, Dropbox, SFTP, and local VFS
  • Protocol Integration: Use cloud URLs directly in DuckDB queries (spfs://, odfs://, gdfs://, dbxfs://, sftp://, vfs://)
  • Table Functions: Built-in ls(), stat(), du() for filesystem exploration
  • Smart Caching: 3-tier LRU cache with TTL for optimal performance
  • Secure Auth: OAuth2 flows with token persistence via DuckDB secrets
  • Format Support: Parquet, CSV, JSON, Delta Lake, Iceberg

πŸ“¦ Quick Start

Prerequisites

# Linux/WSL
sudo apt-get update
sudo apt-get install -y build-essential cmake ninja-build \
    libssl-dev libcurl4-openssl-dev libssh2-1-dev

# macOS
brew install cmake ninja openssl libcurl libssh2

Build

# Quick build
make build

# Or manually
mkdir build && cd build
cmake .. -GNinja -DCMAKE_BUILD_TYPE=Release
ninja

Install

# Copy extension to DuckDB extensions directory
cp build/cloudfs.duckdb_extension ~/.duckdb/extensions/v1.4.0/linux_amd64/

πŸ”§ Usage

Basic Example

-- Load extension
LOAD cloudfs;

-- Create secret for OneDrive
-- WARNING: CREATE SECRET statements are stored in CLI history as plain text.
-- Use environment variables or secret files instead of literal values in production.
CREATE SECRET onedrive_secret (
    TYPE onedrive,
    PROVIDER config,
    CLIENT_ID 'your-client-id',
    CLIENT_SECRET 'your-client-secret'
);

-- List files
SELECT * FROM ls('odfs://Documents/');

-- Read Parquet directly
SELECT * FROM 'odfs://Data/sales.parquet';

-- File metadata
SELECT name, size, modified FROM stat('odfs://report.csv');

-- Directory usage
SELECT * FROM du('odfs://Projects/');

Supported Protocols

Protocol Provider Example URL
odfs:// OneDrive odfs://Documents/file.parquet
spfs:// SharePoint spfs://sites/team/Shared Documents/data.csv
gdfs:// Google Drive gdfs://My Drive/dataset.parquet
dbxfs:// Dropbox dbxfs:///work/analysis.csv
sftp:// SFTP sftp://user@server/path/to/file.parquet
vfs:// VFS Agent vfs://localhost:19876/data/file.csv

πŸ“– Documentation

πŸ› οΈ Development

Setup Development Environment

# Install all dev tools (commitlint, pre-commit, clang-format, etc.)
make dev-setup

# Validate setup
make validate

Common Tasks

make help           # Show all available commands
make format         # Format code (C++, CMake, Shell)
make lint           # Run all linters
make test           # Run tests
make check-all      # Run all checks

Commit Guidelines

We use Conventional Commits:

# Format: <type>(<scope>): <subject>
git commit -m "feat(gdrive): add resumable uploads"
git commit -m "fix(table-functions): handle null pointers"
git commit -m "docs(readme): update installation steps"

Valid types: feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert

See docs/DEVELOPMENT.md for complete guidelines.

πŸ—οΈ Architecture

cloudfs/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ core/              # Core filesystem abstraction
β”‚   β”‚   β”œβ”€β”€ cloud_filesystem.cpp
β”‚   β”‚   β”œβ”€β”€ cloud_cache.cpp
β”‚   β”‚   β”œβ”€β”€ cloud_http.cpp
β”‚   β”‚   β”œβ”€β”€ cloud_auth.cpp
β”‚   β”‚   └── cloud_table_functions.cpp
β”‚   β”œβ”€β”€ providers/         # Cloud provider implementations
β”‚   β”‚   β”œβ”€β”€ onedrive/
β”‚   β”‚   β”œβ”€β”€ sharepoint/
β”‚   β”‚   β”œβ”€β”€ gdrive/
β”‚   β”‚   β”œβ”€β”€ dropbox/
β”‚   β”‚   β”œβ”€β”€ sftp/
β”‚   β”‚   └── vfs/
β”‚   └── extension/         # DuckDB extension wrapper
β”œβ”€β”€ agent/                 # VFS Go agent service
β”œβ”€β”€ test/                  # SQL tests
β”œβ”€β”€ docs/                  # Documentation
└── scripts/               # Build and development scripts

Key Components

  • CloudFileSystem: DuckDB FileSystem implementation
  • ICloudBackend: Provider interface (stateless, capabilities-driven)
  • ICloudAuthProvider: Authentication abstraction (OAuth2, token, config)
  • CloudCache: 3-tier LRU cache (metadata, content, partial)
  • CloudHttpClient: libcurl wrapper with retry logic

πŸ§ͺ Testing

# Run full test suite
make test

# Quick validation
scripts/check_deps.sh       # Check dependencies
scripts/build_and_test.sh   # Build and test

# Test table functions manually
duckdb -unsigned << EOF
LOAD './cloudfs.duckdb_extension';
SELECT cloudfs_version();
SELECT * FROM ls('vfs://localhost:19876/');
EOF

πŸ“Š Performance

  • Metadata Caching: TTL-based with configurable expiration
  • Content Caching: LRU with size limits
  • Partial Downloads: Range requests for selective reads
  • Batch Operations: Optimized for Delta queries (OneDrive/SharePoint)

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feat/amazing-feature
  3. Make your changes following our commit guidelines
  4. Ensure all checks pass: make check-all
  5. Commit: git commit -m "feat(provider): add amazing feature"
  6. Push: git push origin feat/amazing-feature
  7. Open a Pull Request

See docs/DEVELOPMENT.md for detailed contributing guidelines.

πŸ“‹ Requirements

  • C++ Compiler: GCC 7+ or Clang 6+ (C++17 support)
  • CMake: 3.5 or later
  • Ninja: Build system
  • OpenSSL: 1.1.0 or later
  • libcurl: 7.58.0 or later
  • libssh2: 1.8.0 or later (for SFTP)
  • DuckDB: 1.4.0 or later

πŸ› Known Issues & Roadmap

See AGENT_HANDOVER.md for current status and next steps.

πŸ“„ License

MIT License - see LICENSE file for details.

πŸ™ Acknowledgments

  • DuckDB - Amazing embedded database
  • Built on top of DuckDB's extension template

πŸ“ž Support


Made with ❀️ for the DuckDB community

About

A duckdb-based cloud filesystem query engine

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors