Gig
$500 - $1000
TBD
Mar 8, 2025
We need an expert in web scraping and data mining to extract a large dataset of US-based law firms (500,000+ records) from multiple sources. The final list must include the firm name, website,
You must know Python Scrapy, BeautifulSoup, Selenium, or Google Maps API scraping and be able to avoid bans (proxies, CAPTCHAs).
Data Requirements (What You Need to Scrape)
The goal is to merge and deduplicate law firms from multiple sources. Here are some examples from where you can scrape data, but it's ultimately up to you:
Legal Directories (Main Data Sources)
FindLaw – Large directory of law firms
Justia – Many solo and small firms
Avvo – Includes firm ratings & details
What to extract from these directories:
Law Firm Name
Website
Email
Phone Number
Address (City, State, ZIP)
Practice Area
Other tools/sites:
Google Maps API Scraping
Use Google Places API to extract law firms by city and state.
Collect Google reviews, phone numbers, and website links.
Requires proxy handling to avoid getting blocked.
State Bar Associations (Supplemental Data)
Check official State Bar websites for lawyer registration data.
Example: California Bar Directory
Extract law firm names and contacts from each state.
Use LinkedIn Sales Navigator to filter for "Law Firms" in the United States.
Extract the company name, website, and employee count using PhantomBuster or TexAu.
Job Requirements (What You MUST Know)
Technical Skills:
Python Scrapy, Selenium, or BeautifulSoup for web scraping
Experience scraping Google Maps API (avoiding bans)
Knows how to handle CAPTCHAs and proxies (essential for large-scale scraping)
Can deduplicate and clean the data
Deliverables:
CSV file with 500,000+ law firms
Website
Fully categorized by practice area & location
Fully categorized by practice area & location
Budget & Payment
Budget: $500 - $1000 (fixed price)
Deadline: 1-2 weeks
Bonus for High-Quality Work: If the data is accurate & complete, we will offer a performance bonus!
How to Apply (IMPORTANT)
Answer these questions when applying:
Have you scraped Justia, FindLaw, Avvo, or Google Maps API before?
How will you avoid getting blocked while scraping?
What tools do you use for
Can you show a sample dataset you have scraped before?
If you can do this project successfully, we have more large-scale scraping jobs available - also on a full-time basis.
Looking forward to working with an expert who can deliver clean, accurate data!
If you're confident you can do this, send me a message telling me your exact plan for how you can get as close to scraping 500k law firms as possible, along with your experience!