📢 Notice: This branch contains the upcoming CLI for Ferret v2. For the stable v1 release, please visit CLI v1.
Ferret CLI is a command-line interface for the Ferret web scraping system. Ferret uses its own query language called FQL (Ferret Query Language) - a SQL-like language designed specifically for web scraping, browser automation, and data extraction tasks.
- About Ferret CLI
- What is FQL?
- Key Features
- Installation
- Quick Start
- Commands Overview
- Configuration
- Browser Management
- Advanced Usage
- Examples
- Troubleshooting
- Development
- Contributors
FQL (Ferret Query Language) is a declarative language that combines the familiar syntax of SQL with powerful web automation capabilities. It allows you to:
- Navigate web pages and interact with elements
- Extract data from HTML documents
- Handle dynamic content and JavaScript-heavy sites
- Manage browser sessions and cookies
- Perform complex data transformations
- Execute parallel scraping operations
- 🚀 Fast and Efficient: Built-in concurrency and optimized execution
- 🌐 Browser Automation: Full Chrome/Chromium browser control
- 🔄 Dynamic Content: Handle SPAs and JavaScript-heavy sites
- 📊 Data Processing: Built-in functions for data manipulation
- 🛠️ Flexible Runtime: Run locally or on remote workers
- 💾 Session Management: Persistent cookies and browser state
- 🔧 Configuration: Extensive customization options
Documentation is available at our website.
You can download the latest binaries from here.
go install github.com/MontFerret/cli/v2/ferret@latestcurl https://raw.githubusercontent.com/MontFerret/cli/master/install.sh | shThe simplest way to get started is with the interactive REPL:
ferret repl
Welcome to Ferret REPL
Please use `exit` or `Ctrl-D` to exit this program.
>>> RETURN "Hello, Ferret!"
"Hello, Ferret!"Create a simple script (example.fql) to scrape a webpage:
// Navigate to a website and extract data
LET page = DOCUMENT("https://news.ycombinator.com")
FOR item IN ELEMENTS(page, ".submission")
LET title = ELEMENT(item, ".title")
RETURN {
title: title.innerText,
url: title.href
}
Run the script:
ferret run example.fqlNote:
execis an alias forrun— both work interchangeably.
For JavaScript-heavy sites, use browser automation:
# Open browser window for debugging
ferret run --browser-open my-script.fql
# Run headlessly for production
ferret run --browser-headless my-script.fqlExample browser automation script:
// Browser automation example
LET page = DOCUMENT("https://example.com", { driver: "cdp" })
CLICK(page, "#search-button")
WAIT_ELEMENT(page, "#results")
RETURN ELEMENTS(page, ".result-item")
Pass dynamic values to your scripts:
ferret run -p 'url:"https://example.com"' -p 'limit:10' my-script.fqlUse parameters in your FQL script:
LET page = DOCUMENT(@url) // Use the url parameter
LET items = ELEMENTS(page, ".item")
RETURN items
Run a quick FQL expression without a file:
ferret run --eval 'RETURN 2 + 2'Execute scripts on remote Ferret workers:
ferret run --runtime 'https://my-worker.com' my-script.fqlUsage:
ferret [flags]
ferret [command]
Available Commands:
browser Manage Ferret browsers
build Compile FQL scripts into bytecode artifacts
check Check FQL scripts for syntax and semantic errors
config Manage Ferret configs
fmt Format FQL scripts
inspect Compile and disassemble a FQL script
repl Launch interactive FQL shell
run Run a FQL script (alias: exec)
update Update Ferret CLI
version Show the CLI version information
Flags:
-h, --help help for ferret
-l, --log-level string Set the logging level ("debug"|"info"|"warn"|"error"|"fatal") (default "info")
Use "ferret [command] --help" for more information about a command.
Run a FQL script, a compiled artifact file, or an inline expression. To launch the interactive REPL, use the ferret repl command.
ferret run [script]
ferret exec [script] # alias| Flag | Short | Description | Default |
|---|---|---|---|
--runtime |
-r |
Runtime type ("builtin" or a remote worker URL) |
builtin |
--proxy |
-x |
Proxy server address | |
--user-agent |
-a |
User-Agent header | |
--browser-address |
-d |
CDP debugger address | http://127.0.0.1:9222 |
--browser-open |
-B |
Open a visible browser for execution | false |
--browser-headless |
-b |
Open a headless browser for execution | false |
--browser-cookies |
-c |
Keep cookies between queries | false |
--param |
-p |
Query parameter (key:value, repeatable) |
|
--eval |
-e |
Inline FQL expression (cannot be used with file args) |
Compiled artifacts are auto-detected by content for file inputs and piped stdin, so artifacts produced by ferret build work even when they do not use a .fqlc filename. Artifact execution currently requires the builtin runtime.
Launch the interactive FQL shell. Supports command history, multiline input (toggle with %), and all runtime flags.
ferret replAccepts the same runtime and --param flags as run (everything except --eval).
Compile one or more FQL scripts without executing them. Reports syntax and semantic errors.
ferret check [files...]Compile one or more FQL scripts into serialized bytecode artifacts.
ferret build [files...]| Flag | Short | Description |
|---|---|---|
--output |
-o |
Output file path, or output directory (if the path is an existing directory) for one or more inputs |
Without --output, each input writes a sibling artifact with the same base name and a .fqlc extension.
Format FQL scripts. By default, files are overwritten in place.
ferret fmt [files...]| Flag | Description | Default |
|---|---|---|
--dry-run |
Print formatted output to stdout instead of overwriting | false |
--print-width |
Maximum line length | 80 |
--tab-width |
Indentation size | 4 |
--single-quote |
Use single quotes instead of double quotes | false |
--bracket-spacing |
Add spaces inside brackets | true |
--case-mode |
Keyword case: upper, lower, or ignore |
upper |
Compile a FQL script and display its disassembled bytecode. Useful for debugging and understanding script compilation.
ferret inspect [script]| Flag | Description |
|---|---|
--eval / -e |
Inline FQL expression |
--bytecode |
Show only bytecode instructions |
--constants |
Show only the constant pool |
--functions |
Show only function definitions |
--summary |
Show a high-level program summary |
--spans |
Show debug source spans |
When no filter flags are provided, the full disassembly is printed.
Ferret CLI can be configured using the config command or configuration files.
# Set the CDP browser address
ferret config set browser-address "http://localhost:9222"
# Set user agent
ferret config set user-agent "MyBot 1.0"
# Set default runtime
ferret config set runtime "builtin"# List all configuration values
ferret config list
# Get a specific value
ferret config get browser-addressConfiguration files are stored in:
- Linux/macOS:
~/.config/ferret/config.yaml - Windows:
%APPDATA%\ferret\config.yaml
Values are resolved in this order (highest to lowest):
- Command-line flags
- Environment variables (prefixed with
FERRET_, e.g.FERRET_RUNTIME) - Configuration file
- Defaults
| Key | Description | Default |
|---|---|---|
log-level |
Logging level | info |
runtime |
Runtime type (builtin or a remote URL) |
builtin |
browser-address |
Chrome DevTools Protocol address | http://127.0.0.1:9222 |
browser-cookies |
Keep cookies between queries | false |
browser-open |
Open a visible browser for execution | false |
browser-headless |
Open a headless browser for execution | false |
proxy |
Proxy server address | |
user-agent |
Custom User-Agent header |
# Open a new browser instance (uses ./.ferret-browser by default)
ferret browser open
# Open in headless mode
ferret browser open --headless
# Open on a custom debugging port
ferret browser open --port 9223
# Start in background and print the process ID
ferret browser open --detach
# Specify a custom user data directory
ferret browser open --user-dir /tmp/ferret-profile| Flag | Short | Description | Default |
|---|---|---|---|
--detach |
-d |
Start in background, print PID | false |
--headless |
Launch in headless mode | false |
|
--port |
-p |
Remote debugging port | 9222 |
--user-dir |
Browser user data directory | <cwd>/.ferret-browser |
If --user-dir is omitted, Ferret launches Chrome with a profile under .ferret-browser in the current working directory. The same default applies when Ferret opens a managed browser for run or repl.
# Close the default browser
ferret browser close
# Close a specific browser by PID
ferret browser close 12345// E-commerce product scraping with error handling
LET page = DOCUMENT("https://shop.example.com/products")
LET products = (
FOR product IN ELEMENTS(page, ".product-card")
LET name = ELEMENT(product, ".product-name")
LET price = ELEMENT(product, ".price")
LET image = ELEMENT(product, ".product-image")
// Handle missing elements gracefully
RETURN name != NONE ? {
name: TRIM(name.innerText),
price: REGEX_MATCH(price.innerText, /\$[\d.]+/)[0],
image: image.src,
url: CONCAT("https://shop.example.com", product.href)
} : NONE
)
// Filter out null results
LET validProducts = (
FOR product IN products
FILTER product != NONE
RETURN product
)
RETURN validProducts
// Login form automation
LET page = DOCUMENT("https://example.com/login", { driver: "cdp" })
// Fill in form fields
INPUT(page, "#username", "myuser")
INPUT(page, "#password", "mypassword")
// Submit form and wait for navigation
CLICK(page, "#login-button")
WAIT_NAVIGATION(page)
// Extract user data after login
RETURN {
loggedIn: ELEMENT(page, ".user-menu") != NONE,
username: ELEMENT(page, ".username").innerText
}
// Scrape multiple pages in parallel
LET urls = [
"https://news.ycombinator.com",
"https://reddit.com/r/programming",
"https://dev.to"
]
LET results = (
FOR url IN urls
LET page = DOCUMENT(url)
RETURN {
url: url,
title: ELEMENT(page, "title").innerText,
headlines: (
FOR headline IN ELEMENTS(page, "h1, h2, h3")
RETURN headline.innerText
)
}
)
RETURN results
// Combine web scraping with API calls
LET page = DOCUMENT("https://github.com/trending")
LET repos = ELEMENTS(page, ".Box-row")
LET details = (
FOR repo IN repos[0:5]
LET repoName = ELEMENT(repo, "h1 a").innerText
LET apiUrl = CONCAT("https://api.github.com/repos/", repoName)
// Make API call
LET apiData = DOCUMENT(apiUrl, { driver: "http" })
RETURN {
name: repoName,
description: ELEMENT(repo, "p").innerText,
stars: apiData.stargazers_count,
language: apiData.language
}
)
RETURN details
📊 Extract table data
// Extract data from HTML tables
LET page = DOCUMENT("https://example.com/data-table")
LET table = ELEMENT(page, "table")
LET headers = (
FOR header IN ELEMENTS(table, "thead th")
RETURN header.innerText
)
LET rows = ELEMENTS(table, "tbody tr")
LET data = (
FOR row IN rows
LET cells = (
FOR cell IN ELEMENTS(row, "td")
RETURN cell.innerText
)
LET record = {}
FOR i IN RANGE(0, LENGTH(headers))
SET_KEY(record, headers[i], cells[i])
RETURN record
)
RETURN data
📱 Mobile viewport simulation
// Test mobile-responsive sites
LET page = DOCUMENT("https://example.com", {
driver: "cdp",
viewport: {
width: 375,
height: 667,
mobile: true
},
userAgent: "Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X)"
})
// Check mobile-specific elements
LET mobileMenu = ELEMENT(page, ".mobile-menu")
LET desktopMenu = ELEMENT(page, ".desktop-menu")
RETURN {
isMobile: mobileMenu != NONE,
isDesktop: desktopMenu != NONE,
viewport: {
width: page.viewport.width,
height: page.viewport.height
}
}
Browser connection failed
# Check if Chrome is running with remote debugging
google-chrome --remote-debugging-port=9222
# Or use Ferret's browser management
ferret browser openScript execution timeout
// Increase timeouts for slow pages
LET page = DOCUMENT("https://slow-site.com", {
driver: "cdp",
timeout: 30000 // 30 seconds
})
Element not found errors
// Use WAIT_ELEMENT for dynamic content
LET page = DOCUMENT("https://spa.example.com", { driver: "cdp" })
WAIT_ELEMENT(page, "#dynamic-content", 10000)
LET element = ELEMENT(page, "#dynamic-content")
Memory issues with large datasets
// Process data in chunks using supported syntax
LET items = ELEMENTS(page, ".item")
LET batchSize = 100
FOR i IN RANGE(0, LENGTH(items), batchSize)
FOR item IN items
// Process individual items...
RETURN item.innerText
Enable debug logging for troubleshooting:
ferret run --log-level debug my-script.fql- Use CSS selectors efficiently: Specific selectors are faster than broad ones
- Minimize DOM queries: Store elements in variables when reusing
- Use headless mode:
--browser-headlessis faster for production - Implement timeouts: Always set appropriate timeouts for reliability
- Handle errors gracefully: Use conditional logic to handle missing elements
# Clone the repository
git clone https://github.com/MontFerret/cli.git
cd cli
# Install dependencies
go mod download
# Build the binary
make compile
# Run tests
make test- Fork the repository
- Create a feature branch:
git checkout -b my-new-feature - Make your changes and add tests
- Run the test suite:
make test - Submit a pull request
# Install development tools
make install-tools
# Format code
make fmt
# Run linters
make lint
# Run all checks
make build