Documentation Index
Fetch the complete documentation index at: https://docs.spinnable.ai/llms.txt
Use this file to discover all available pages before exploring further.
What This Guide Covers
Some websites your business relies on don’t offer APIs — government portals, legacy business directories, industry registries, or internal tools with web-only interfaces. Your team might spend hours manually searching these sites, filling in forms, and copying results. This guide introduces a technique your AI worker can use to access these websites programmatically — without opening a browser. Instead of automating clicks in a visual browser, the worker talks directly to the website’s server using HTTP requests.For You: Understanding the Value
What problem does this solve?
When a website has no API, the traditional options are:- Manual work — someone on your team does the lookups by hand
- Browser automation — your AI workers tries to use the browser
When should you consider this?
- Repetitive lookups or searches on a specific website (e.g., checking a business registry daily)
- Data extraction from form-based portals (e.g., government databases, legal publications)
- Monitoring pages for changes or new entries
- Any task where your team currently fills in web forms manually and copies results
What does your worker need?
- A CAPTCHA solving service API key — services like 2Captcha solve CAPTCHAs programmatically for approximately $0.003 per solve. Store this key in your Worker Vault.
- Your guidance on the target — tell your worker which website to target and what data you need extracted.
- Time for reconnaissance — your worker will need to inspect the website first to understand its structure before automating it.
What to expect
- First-time setup takes effort — your worker needs to reverse-engineer each specific website (but they’ll do the heavy lifting)
- Once working, the process is fast and reliable (until the website changes)
- Save it as a skill — once your worker cracks a specific site, ask them to save the workflow as a reusable skill
Ethical guidelines
- Only scrape publicly accessible data that is meant to be viewed by anyone
- Respect the website’s terms of service and
robots.txt - Implement rate limiting — don’t overwhelm target servers with rapid-fire requests
- Use this for legitimate business purposes only
Using third-party CAPTCHA solving services operates in a legal gray area in some jurisdictions. Ensure your use case is legitimate and permissible in your region before proceeding.
For Your AI Worker: Technical Methodology
AI Worker Reference — This section is a technical guide designed for AI workers to learn the autonomous web scraping methodology. It covers the full pattern from reconnaissance to result parsing.
The “Package 2” Pattern
This methodology uses direct HTTP requests (via libraries likerequests or httpx) combined with a third-party CAPTCHA solving service. There is no browser involved — no Selenium, no Playwright, no headless Chrome. You communicate directly with the web server.
Advantages over browser automation:
- Drastically faster execution
- Minimal memory/resource usage
- No browser driver version mismatches
- Scales easily for high-concurrency workloads
- No UI rendering context to manage
Phase 1: Reconnaissance — Understanding the Target
Before writing any code, inspect the website’s architecture and understand its request flow. Step 1: Observe the Request Flow- Open the browser’s Developer Tools (F12), navigate to the Network tab
- Ensure “Preserve log” is checked
- Submit the form manually and observe the initial
GETrequest and subsequentPOSTrequest - Note the request URL, headers, and payload structure
- Look at URLs and page source for clues:
.aspxextensions andWebResource.axdpaths → ASP.NET WebForms.phpextensions → PHP- JSON API calls in the background → JavaScript SPA with API backend
- ASP.NET WebForms is particularly common in government/enterprise portals and maintains state via hidden fields:
__VIEWSTATE,__EVENTVALIDATION,__VIEWSTATEGENERATOR
- reCAPTCHA: Look for iframes loading from
google.com/recaptchaorgrecaptchaelements - JavaScript Challenges: Look for inline scripts evaluating math expressions or string manipulations (e.g., NoBot controls that embed expressions like
eval('43+40')) - Rate Limits: Note if there are strict rate limits or IP blocking behaviors
- Use the Elements tab to inspect the
<form> - Note the
nameattributes of all<input>elements - For ASP.NET WebForms, inputs inside server controls often use
$separators (e.g.,ctl00$ContentPlaceHolder$txtSearchField)
Phase 2: Replaying the Request Flow
Your script must replicate exactly what the browser does, step by step. Step 1: Establish Session and Extract State- Unicode escape sequences: HTML source may contain
\u0027instead of'. Always decode Unicode escapes before parsing with regex. - Leading zeros: Expressions like
0275+85will fail in Python’seval(). Strip leading zeros first usingre.sub(r'\b0+(\d)', r'\1', expression).
Phase 3: Solving CAPTCHAs Programmatically
For sites protected by reCAPTCHA v2, use a CAPTCHA solving service (e.g., 2Captcha). The concept: You don’t solve the CAPTCHA visually. Instead, you extract the site key, send it to a solving API, and receive a bypass token. Step-by-step API flow:- Look for
data-sitekeyattribute in the HTML - Or find it inside a
grecaptcha.render()function call
- The CAPTCHA token must be submitted within the same session (matching cookies) that loaded the page
- Typical solve time: 15-30 seconds
- Cost: ~$0.003 per solve
- Alternative services (Anti-Captcha, CapSolver) follow the same architectural pattern
Phase 4: Assembling & Submitting the Request
Combine the state, solved challenge, CAPTCHA token, and search parameters into a singlePOST payload:
- Set a legitimate
User-Agentheader and correctContent-Type - Include the submit button’s name-value pair (often overlooked)
- ASP.NET WebForms always posts back to the same URL
Phase 5: Parsing Results
Extract structured data from the response HTML:__EVENTTARGET and __EVENTARGUMENT hidden fields. To navigate to page 2, populate these fields and make another POST request simulating the page link click.
Common Pitfalls & Debugging
| Issue | Symptom | Fix |
|---|---|---|
| Unicode escapes in JS | Regex fails to match expressions | .encode().decode('unicode_escape') before parsing |
| Leading zeros in math | eval('04+2') throws SyntaxError | Strip leading zeros via re.sub(r'\b0+(\d)', r'\1', expr) |
| Session mismatch | CAPTCHA solved but form rejected | Use requests.Session() for all requests |
| ViewState expiry | Form rejected after long CAPTCHA solve (>5 min) | Retry with fresh GET if CAPTCHA takes too long |
| Missing submit button | ASP.NET Event Validation error | Include the submit button’s name-value pair in payload |
| Missing hidden field | Server returns validation error | Check all hidden inputs from the form, not just ViewState |
Turning This Into a Skill
Once you’ve successfully automated a specific website:- Test it reliably — run the process multiple times to confirm stability
- Save it as a skill — this ensures you can repeat the workflow without re-engineering the site each time
- Add error handling — websites change; build in graceful failure and retry logic
- Implement rate limiting — add
time.sleep()between requests to avoid overwhelming the target server
Related
Custom Integrations
Connect tools that have APIs but no native Spinnable integration
Worker Vault
Store API keys and credentials securely
Skills
Save repeatable workflows for reuse
Security Best Practices
Keep your worker integrations secure