A DuckDB extension that registers cloud storage protocols as first-class filesystems with native support for Parquet, CSV, Delta Lake, and Iceberg.
- Multi-Provider Support: OneDrive, SharePoint, Google Drive, Dropbox, SFTP, and local VFS
- Protocol Integration: Use cloud URLs directly in DuckDB queries
(
spfs://,odfs://,gdfs://,dbxfs://,sftp://,vfs://) - Table Functions: Built-in
ls(),stat(),du()for filesystem exploration - Smart Caching: 3-tier LRU cache with TTL for optimal performance
- Secure Auth: OAuth2 flows with token persistence via DuckDB secrets
- Format Support: Parquet, CSV, JSON, Delta Lake, Iceberg
# Linux/WSL
sudo apt-get update
sudo apt-get install -y build-essential cmake ninja-build \
libssl-dev libcurl4-openssl-dev libssh2-1-dev
# macOS
brew install cmake ninja openssl libcurl libssh2# Quick build
make build
# Or manually
mkdir build && cd build
cmake .. -GNinja -DCMAKE_BUILD_TYPE=Release
ninja# Copy extension to DuckDB extensions directory
cp build/cloudfs.duckdb_extension ~/.duckdb/extensions/v1.4.0/linux_amd64/-- Load extension
LOAD cloudfs;
-- Create secret for OneDrive
-- WARNING: CREATE SECRET statements are stored in CLI history as plain text.
-- Use environment variables or secret files instead of literal values in production.
CREATE SECRET onedrive_secret (
TYPE onedrive,
PROVIDER config,
CLIENT_ID 'your-client-id',
CLIENT_SECRET 'your-client-secret'
);
-- List files
SELECT * FROM ls('odfs://Documents/');
-- Read Parquet directly
SELECT * FROM 'odfs://Data/sales.parquet';
-- File metadata
SELECT name, size, modified FROM stat('odfs://report.csv');
-- Directory usage
SELECT * FROM du('odfs://Projects/');| Protocol | Provider | Example URL |
|---|---|---|
odfs:// |
OneDrive | odfs://Documents/file.parquet |
spfs:// |
SharePoint | spfs://sites/team/Shared Documents/data.csv |
gdfs:// |
Google Drive | gdfs://My Drive/dataset.parquet |
dbxfs:// |
Dropbox | dbxfs:///work/analysis.csv |
sftp:// |
SFTP | sftp://user@server/path/to/file.parquet |
vfs:// |
VFS Agent | vfs://localhost:19876/data/file.csv |
- Development Guide - Contributing and development workflow
- Build Quickstart - Detailed build instructions
- Commit Guidelines - Git commit conventions
- Adding a Provider - How to add new cloud providers
- Task Status - Current development status
# Install all dev tools (commitlint, pre-commit, clang-format, etc.)
make dev-setup
# Validate setup
make validatemake help # Show all available commands
make format # Format code (C++, CMake, Shell)
make lint # Run all linters
make test # Run tests
make check-all # Run all checksWe use Conventional Commits:
# Format: <type>(<scope>): <subject>
git commit -m "feat(gdrive): add resumable uploads"
git commit -m "fix(table-functions): handle null pointers"
git commit -m "docs(readme): update installation steps"Valid types: feat, fix, docs, style, refactor, perf, test,
build, ci, chore, revert
See docs/DEVELOPMENT.md for complete guidelines.
cloudfs/
βββ src/
β βββ core/ # Core filesystem abstraction
β β βββ cloud_filesystem.cpp
β β βββ cloud_cache.cpp
β β βββ cloud_http.cpp
β β βββ cloud_auth.cpp
β β βββ cloud_table_functions.cpp
β βββ providers/ # Cloud provider implementations
β β βββ onedrive/
β β βββ sharepoint/
β β βββ gdrive/
β β βββ dropbox/
β β βββ sftp/
β β βββ vfs/
β βββ extension/ # DuckDB extension wrapper
βββ agent/ # VFS Go agent service
βββ test/ # SQL tests
βββ docs/ # Documentation
βββ scripts/ # Build and development scripts
- CloudFileSystem: DuckDB
FileSystemimplementation - ICloudBackend: Provider interface (stateless, capabilities-driven)
- ICloudAuthProvider: Authentication abstraction (OAuth2, token, config)
- CloudCache: 3-tier LRU cache (metadata, content, partial)
- CloudHttpClient: libcurl wrapper with retry logic
# Run full test suite
make test
# Quick validation
scripts/check_deps.sh # Check dependencies
scripts/build_and_test.sh # Build and test
# Test table functions manually
duckdb -unsigned << EOF
LOAD './cloudfs.duckdb_extension';
SELECT cloudfs_version();
SELECT * FROM ls('vfs://localhost:19876/');
EOF- Metadata Caching: TTL-based with configurable expiration
- Content Caching: LRU with size limits
- Partial Downloads: Range requests for selective reads
- Batch Operations: Optimized for Delta queries (OneDrive/SharePoint)
- Fork the repository
- Create a feature branch:
git checkout -b feat/amazing-feature - Make your changes following our commit guidelines
- Ensure all checks pass:
make check-all - Commit:
git commit -m "feat(provider): add amazing feature" - Push:
git push origin feat/amazing-feature - Open a Pull Request
See docs/DEVELOPMENT.md for detailed contributing guidelines.
- C++ Compiler: GCC 7+ or Clang 6+ (C++17 support)
- CMake: 3.5 or later
- Ninja: Build system
- OpenSSL: 1.1.0 or later
- libcurl: 7.58.0 or later
- libssh2: 1.8.0 or later (for SFTP)
- DuckDB: 1.4.0 or later
See AGENT_HANDOVER.md for current status and next steps.
MIT License - see LICENSE file for details.
- DuckDB - Amazing embedded database
- Built on top of DuckDB's extension template
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Made with β€οΈ for the DuckDB community