Local Storage Guide
edgartools is designed for interactive queries against SEC Edgar, which means it normally makes HTTP requests to the SEC website to retrieve data. For example, when you call company.submissions
or filing.attachments
, it makes a request to the SEC.
There are times when you want to minimize or eliminate these requests:
- Performance: Speed up processing by avoiding network requests
- Offline usage: Work without internet access or in restricted environments
- Bandwidth efficiency: Reduce data usage and respect SEC rate limits
- Development: Use cached data for testing and development
edgartools provides comprehensive local storage capabilities to address these needs.
Supported Local Data Types
Data Type | Description |
---|---|
Company Submissions | Company metadata and their 1000 most recent filings |
Company Facts | Standardized company financial facts from XBRL filings |
Filing Attachments | Complete filing documents with all attachments |
Reference Data | Company and mutual fund tickers, exchanges, and other lookups |
Local Data Directory
Default Location
The default local data directory is:
<USER_HOME>/.edgar
Setting the Directory
You can set the local data directory in several ways:
Method 1: Environment Variable
export EDGAR_LOCAL_DATA_DIR="/path/to/local/data"
Method 2: Programmatic (New)
from edgar import set_local_storage_path
import os
# Create the directory first
os.makedirs("/tmp/edgar_data", exist_ok=True)
# Set the path
set_local_storage_path("/tmp/edgar_data")
Method 3: One-Step Setup (New)
from edgar import use_local_storage
import os
# Create and set directory, enable local storage in one call
os.makedirs("/tmp/edgar_data", exist_ok=True)
use_local_storage("/tmp/edgar_data")
Enabling Local Storage
Basic Usage
from edgar import use_local_storage
import os
# Create directory
os.makedirs("~/Documents/edgar", exist_ok=True)
# Set path and enable in one call
use_local_storage("~/Documents/edgar")
All Supported Patterns
# 1. BACKWARD COMPATIBLE
use_local_storage(True) # Enable
use_local_storage(False) # Disable
use_local_storage() # Enable (default)
# 2. NEW INTUITIVE SYNTAX
use_local_storage("/tmp/edgar_data") # Path as string
use_local_storage("~/Documents/edgar") # Tilde expansion
use_local_storage(Path.home() / "edgar") # Path object
# 3. ADVANCED CONTROL
use_local_storage("/tmp/edgar", True) # Set path and enable
use_local_storage("/tmp/edgar", False) # Set path but keep disabled
Checking Status
from edgar import is_using_local_storage
if is_using_local_storage():
print("Local storage is enabled")
else:
print("Using remote SEC data")
Downloading Data to Local Storage
Download Bulk SEC Data
You can download bulk SEC data using the download_edgar_data()
function:
from edgar import download_edgar_data
# Download all data types (submissions, facts, reference data)
download_edgar_data()
# Download only specific data types
download_edgar_data(
submissions=True, # Company metadata and recent filings
facts=True, # Company financial facts
reference=True # Tickers, exchanges, etc.
)
Download Complete Filings
Download individual filings with all attachments using download_filings()
:
from edgar import download_filings
# Download all filings for a specific date
download_filings("2025-01-15")
# Download filings for a date range
download_filings("2025-01-01:2025-01-15")
# Download from a start date onwards
download_filings("2025-01-01:")
# Download up to an end date
download_filings(":2025-01-15")
Note: Downloaded filings are stored in EDGAR_LOCAL_DATA_DIR/filings/YYYYMMDD/
. When local storage is enabled, edgartools automatically checks local storage first before making SEC requests.
Complete Workflow Examples
Example 1: Quick Setup for Development
from edgar import use_local_storage, download_edgar_data
import os
# Setup local storage in one command
os.makedirs("~/edgar_dev", exist_ok=True)
use_local_storage("~/edgar_dev")
# Download essential data
download_edgar_data(submissions=True, reference=True)
# Now all queries use local data when available
from edgar import Company
apple = Company("AAPL") # Uses local data
Example 2: High-Performance Analysis Setup
from edgar import use_local_storage, download_filings, get_filings
import os
# Setup high-performance storage
os.makedirs("/tmp/edgar_fast", exist_ok=True)
use_local_storage("/tmp/edgar_fast")
# Download specific filings for analysis
filings = get_filings(form="10-K", year=2024)
download_filings(filings=filings)
# All subsequent operations are lightning fast
for filing in filings:
financial_data = filing.xbrl() # Instant from local storage
Example 3: Offline Research Environment
from edgar import use_local_storage, download_edgar_data, download_filings
import os
# Setup offline-capable environment
os.makedirs("~/research/edgar_offline", exist_ok=True)
use_local_storage("~/research/edgar_offline")
# Download comprehensive dataset
download_edgar_data() # All bulk data
download_filings("2024-01-01:2024-12-31") # Full year of filings
# Now works completely offline
from edgar import Company, get_filings
companies = get_filings(form="10-K", year=2024) # From local storage
Filtering Downloads
Download Specific Filings
Instead of downloading all filings, you can filter to download only what you need:
from edgar import get_filings, download_filings
# Get filings with filters
filings = get_filings(form="10-K", year=2024).filter(exchange="NYSE")
# Download only these filtered filings
download_filings(filings=filings)
Advanced Filtering Examples
# Download only tech companies' 10-K filings
tech_filings = (get_filings(form="10-K", year=2024)
.filter(exchange=["NASDAQ", "NYSE"])
download_filings(filings=tech_filings)
# Download recent 8-K filings for specific analysis
recent_8k = get_filings(form="8-K", filing_date="2025-01-01:")
download_filings(filings=recent_8k)
Performance Considerations
Storage Space
Different data types require different amounts of storage:
Data Type | Typical Size | Description |
---|---|---|
Reference Data | ~50 MB | Tickers, exchanges, mappings |
Company Facts | ~2 GB | Compressed financial facts |
Submissions | ~5 GB | Company metadata and filings |
Daily Filings | ~100-500 MB | All filings for one day |
Download Times
- Reference data: 1-2 seconds
- Company facts: 2-3 minutes
- Company submissions: 2-3 minutes
- Daily filings: 3-15 minutes depending on volume and time of day
Compression
Filings are automatically compressed to save space. The default compression level is set to 6, but you can adjust it to 9 for maximum compression:
# Enable compression during download
download_filings("2025-01-15", compression_level=9)
Directory Structure
When using local storage, data is organized as follows:
EDGAR_LOCAL_DATA_DIR/
├── reference/ # Ticker and exchange data
│ ├── company_tickers.json
│ ├── ticker.txt
│ └── ...
├── companyfacts/ # Company financial facts
│ └── CIK[0-9]*.json
├── submissions/ # Company submission data
│ └── CIK[0-9]*.json
└── filings/ # Individual filing documents
├── 20250115/ # Filings by date
│ ├── filing1.nc # SGML filing documents
│ └── filing2.nc.gz # Compressed filings
└── ...
Best Practices
1. Start Small
# Begin with reference data only
use_local_storage("~/edgar_test")
download_edgar_data(submissions=False, facts=False, reference=True)
2. Use Appropriate Storage
# Fast SSD for high-performance analysis
use_local_storage("/fast_ssd/edgar_data")
# Network storage for shared team access
use_local_storage("/shared/team/edgar_data")
3. Monitor Storage Usage
from edgar.core import get_edgar_data_directory
import shutil
storage_path = get_edgar_data_directory()
total, used, free = shutil.disk_usage(storage_path)
print(f"Storage usage: {used // (1024**3)} GB used, {free // (1024**3)} GB free")
4. Incremental Updates
# Download only recent filings regularly
from datetime import datetime, timedelta
recent_date = (datetime.now() - timedelta(days=7)).strftime("%Y-%m-%d")
download_filings(f"{recent_date}:")
Troubleshooting
Common Issues
Path doesn't exist:
# ❌ This will fail
use_local_storage("/nonexistent/path")
# ✅ Create directory first
import os
os.makedirs("/tmp/edgar", exist_ok=True)
use_local_storage("/tmp/edgar")
Insufficient space:
# Check available space before large downloads
import shutil
_, _, free = shutil.disk_usage("/tmp")
if free < 10 * (1024**3): # 10 GB
print("Warning: Less than 10 GB free space")
Mixed local/remote data:
# Ensure consistent data source
from edgar import is_using_local_storage
if is_using_local_storage():
print("Using local storage")
else:
print("Using remote SEC data")
# Enable if needed: use_local_storage(True)
Migration and Backup
Moving Local Storage
# Old location
old_path = "~/old_edgar"
# New location
new_path = "/new/location/edgar"
os.makedirs(new_path, exist_ok=True)
# Move data (outside Python)
# cp -r ~/old_edgar/* /new/location/edgar/
# Update edgartools
use_local_storage(new_path)
Backup Strategy
# Create backup of critical data
import shutil
from datetime import datetime
backup_name = f"edgar_backup_{datetime.now().strftime('%Y%m%d')}"
shutil.copytree(get_edgar_data_directory(), f"/backups/{backup_name}")
Summary
The enhanced local storage system in edgartools provides:
Key Functions
use_local_storage()
: Enable/disable local storage with optional path settingset_local_storage_path()
: Set storage directory pathis_using_local_storage()
: Check current statusdownload_edgar_data()
: Download bulk SEC datadownload_filings()
: Download individual filings
Benefits
- Dramatic performance improvements: 10-100x faster than remote requests
- Offline capability: Work without internet connectivity
- Bandwidth efficiency: Reduce network usage and respect SEC limits
- Flexible configuration: Multiple ways to configure and use
- Comprehensive data: Support for all major SEC data types
Quick Start
import os
from edgar import use_local_storage, download_edgar_data
# Setup in one line
os.makedirs("~/edgar", exist_ok=True)
use_local_storage("~/edgar")
# Download essential data
download_edgar_data()
# All subsequent operations use local storage when available
This comprehensive local storage system makes edgartools significantly more efficient for both development and production use cases.