merge: incorporate Apify community PR #4770

# Conflicts:
#	tools/src/aden_tools/credentials/__init__.py
#	tools/src/aden_tools/credentials/apify.py
#	tools/src/aden_tools/tools/__init__.py
#	tools/src/aden_tools/tools/apify_tool/__init__.py
#	tools/src/aden_tools/tools/apify_tool/apify_tool.py
#	tools/tests/tools/test_apify_tool.py
This commit is contained in:
Timothy
2026-03-03 13:10:45 -08:00
2 changed files with 186 additions and 0 deletions
+31
View File
@@ -0,0 +1,31 @@
perf: reduce subprocess spawning in quickstart scripts (#4427)
## Problem
Windows process creation (CreateProcess) is 10-100x slower than Linux fork/exec.
The quickstart scripts were spawning 4+ separate `uv run python -c "import X"`
processes to verify imports, adding ~600ms overhead on Windows.
## Solution
Consolidated all import checks into a single batch script that checks multiple
modules in one subprocess call, reducing spawn overhead by ~75%.
## Changes
- **New**: `scripts/check_requirements.py` - Batched import checker
- **New**: `scripts/test_check_requirements.py` - Test suite
- **New**: `scripts/benchmark_quickstart.ps1` - Performance benchmark tool
- **Modified**: `quickstart.ps1` - Updated import verification (2 sections)
- **Modified**: `quickstart.sh` - Updated import verification
## Performance Impact
**Benchmark results on Windows:**
- Before: ~19.8 seconds for import checks
- After: ~4.9 seconds for import checks
- **Improvement: 14.9 seconds saved (75.2% faster)**
## Testing
- ✅ All functional tests pass (`scripts/test_check_requirements.py`)
- ✅ Quickstart scripts work correctly on Windows
- ✅ Error handling verified (invalid imports reported correctly)
- ✅ Performance benchmark confirms 75%+ improvement
Fixes #4427
@@ -0,0 +1,155 @@
# Apify Tool for Hive
Universal web scraping and automation through the Apify marketplace.
## Overview
Apify is a cloud platform providing a marketplace of thousands of ready-made web scrapers and automation tools ("Actors"). This integration allows Hive agents to extract structured data from almost any website without writing custom scraping code.
## Why Use This?
While agents can make raw HTTP requests, Apify interactions are complex:
1. **Async Polling**: Actor runs take time (seconds to minutes). A raw request just returns a `runId`, requiring the agent to loop, sleep, and poll status—which LLMs struggle with.
2. **Dataset Abstraction**: Fetching results requires knowing specific dataset IDs and pagination logic. This tool abstracts that into a simple `wait=True` parameter.
3. **Security**: Keeps the `APIFY_API_TOKEN` in the credential store instead of exposing it to the agent context.
## Credential Setup
1. Sign up at [console.apify.com](https://console.apify.com)
2. Go to Settings → Integrations
3. Copy your Personal API token
4. Set as environment variable: `export APIFY_API_TOKEN=your_token_here`
## Tools
### `apify_run_actor`
Run an Apify Actor to scrape or automate websites.
**Parameters:**
- `actor_id` (str): Actor identifier (e.g., `"apify/instagram-scraper"`)
- `input` (dict): JSON input specific to the actor (default: `{}`)
- `wait` (bool): If `True`, waits for completion and returns results immediately. If `False`, returns `runId` for async status checks (default: `True`)
**Example:**
```python
# Synchronous execution (recommended)
result = apify_run_actor(
actor_id="apify/instagram-profile-scraper",
input={"usernames": ["instagram", "google"]},
wait=True
)
# Returns: {"items": [...], "run_id": "...", "status": "SUCCEEDED"}
# Asynchronous execution
result = apify_run_actor(
actor_id="apify/web-scraper",
input={"startUrls": [{"url": "https://example.com"}]},
wait=False
)
# Returns: {"run_id": "abc123", "status": "RUNNING"}
```
### `apify_get_dataset`
Retrieve results from a completed actor run.
**Parameters:**
- `dataset_id` (str): Dataset identifier from a completed run
**Example:**
```python
data = apify_get_dataset(dataset_id="xyz789")
# Returns: {"items": [...], "count": 42}
```
### `apify_get_run`
Check the status of an actor run.
**Parameters:**
- `run_id` (str): Run identifier returned from `apify_run_actor` with `wait=False`
**Example:**
```python
status = apify_get_run(run_id="abc123")
# Returns: {"status": "SUCCEEDED", "default_dataset_id": "xyz789", ...}
```
### `apify_search_actors`
Search the Apify marketplace for actors (optional).
**Parameters:**
- `query` (str): Search keywords
- `limit` (int): Maximum results to return (default: 10)
**Example:**
```python
actors = apify_search_actors(query="instagram", limit=5)
# Returns: {"items": [...], "total": 24}
```
## Use Cases
### Lead Generation
```python
# Find email addresses of decision-makers on LinkedIn
result = apify_run_actor(
actor_id="apify/linkedin-profile-scraper",
input={"search": "CEO at tech company in SF"},
wait=True
)
emails = [p["email"] for p in result["items"] if p.get("email")]
```
### Market Research
```python
# Monitor product prices across multiple platforms
result = apify_run_actor(
actor_id="apify/amazon-scraper",
input={"search": "wireless headphones", "maxItems": 50},
wait=True
)
prices = [item["price"] for item in result["items"]]
avg_price = sum(prices) / len(prices)
```
### Social Media Analytics
```python
# Analyze YouTube video comments for sentiment
result = apify_run_actor(
actor_id="apify/youtube-scraper",
input={"videoUrls": ["https://youtube.com/watch?v=..."]},
wait=True
)
comments = result["items"][0]["comments"]
```
## Error Handling
All tools return `{"error": "message", "help": "..."}` on failure:
- Missing credentials
- Invalid actor ID
- Actor not found (404)
- Rate limit exceeded (429)
- Network timeouts
- Invalid API token (401)
## API Documentation
- [Apify API v2](https://docs.apify.com/api/v2)
- [Actor Marketplace](https://apify.com/store)