Job Scraper for Indeed, built with Selenium.

This is an example usage of the job scraper with a database like DuckDB to persist scraped information for future use and analysis.
Source: hh-13/jobs-scraper@Github

!poetry run python scraper_example.py 'in.indeed.com' -k 'software engineer' -l 'Remote' -r 4 -n 1 --sort_by_date
Saved to DB!
import duckdb

con = duckdb.connect("jobs.db")
<duckdb.duckdb.DuckDBPyConnection at 0x7bda3f5dfd30>
con.sql("SHOW TABLES;")
┌──────────┐
│   name   │
│ varchar  │
├──────────┤
│ JOBS     │
│ SEARCHES │
└──────────┘
con.sql("SELECT * FROM SEARCHES;").pl()
shape: (1, 9)
SEARCH_ID SEARCH_TERM URL LOCATION REMOTE JOB_TYPE PAY COMPANY JOB_LANGUAGE
i32 str list[str] list[str] list[str] list[str] list[str] list[str] list[str]
1 "data" ["https://in.indeed.com/jobs?q=data&l=Remote&sort=date&start=0", "https://in.indeed.com/jobs?q=data&l=Remote&sort=date&start=10", "https://in.indeed.com/jobs?q=data&l=Remote&sort=date&start=20"] ["Remote (596)"] ["Remote (596)", "Hybrid work (3)"] ["Full-time (427)", "Contract (76)", … "Fresher (7)"] ["₹ 37,500.00+/month (501)", "₹ 67,500.00+/month (398)", … "₹ 1,28,333.34+/month (102)"] ["Nagarro (47)", "MNJ Software (19)", … "Syneos - Clinical and Corporate - Prod (10)"] ["English (596)"]
con.sql("SELECT UNNEST(JOB_TYPE) AS JOB_TYPES FROM SEARCHES WHERE SEARCH_ID=1;")
┌──────────────────┐
│    JOB_TYPES     │
│     varchar      │
├──────────────────┤
│ Full-time (427)  │
│ Contract (76)    │
│ Temporary (75)   │
│ Part-time (37)   │
│ Internship (11)  │
│ Fresher (7)      │
└──────────────────┘
con.sql("SELECT * FROM JOBS LIMIT 5;")
┌────────┬───────────┬──────────────────────┬──────────────────────┬───────────────────────────────────────────────────┐
│ JOB_ID │ SEARCH_ID │        TITLE         │         URL          │                    DESCRIPTION                    │
│ int32  │   int32   │       varchar        │       varchar        │                      varchar                      │
├────────┼───────────┼──────────────────────┼──────────────────────┼───────────────────────────────────────────────────┤
│      1 │         1 │ Engagement & Data …  │ https://in.indeed.…  │ Engagement & Data Specialist\n=================…  │
│      2 │         1 │ Digital Marketing …  │ https://in.indeed.…  │ Digital Marketing Specialist\n=================…  │
│      3 │         1 │ Marketing Associate  │ https://in.indeed.…  │ Marketing Associate\n===================\n\nInd…  │
│      4 │         1 │ Research Analyst (…  │ https://in.indeed.…  │ Research Analyst (WFH Relevant candidates only,…  │
│      5 │         1 │ Lead AI / ML / Dat…  │ https://in.indeed.…  │ Lead AI / ML / Data Science Engineer - India\n=…  │
└────────┴───────────┴──────────────────────┴──────────────────────┴───────────────────────────────────────────────────┘
from IPython.display import display, Markdown

display(
    Markdown(
        con.sql(
            "SELECT len(DESCRIPTION) DESC_LEN, DESCRIPTION FROM JOBS ORDER BY DESC_LEN LIMIT 1;"
        ).fetchone()[1]
    )
)

Cyber Security Data Loss Protection - Forcepoint DLP

Varchai Pvt LtdRemote₹96,112.53 - ₹1,15,748.42 a yearApply nowsave-icon Job details

Here’s how the job details align with your profile.### Pay

  • ₹96,112.53 - ₹1,15,748.42 a year ### Job type

  • Full-time ### Shift and schedule

  • Day shift

  • Monday to Friday  Location ——–

Remote Full job description

  • Forcepoint DLP · Symantec dlp
  • Forcepoint · Microsoft Purview · Jira · Sumo Logic · Netskope CASB
  • Data Loss Prevention l Security Investigation l Incident Management
  • Team Management · Data loss prevention · Symantec dlp · McAfee · Microsoft Purview · wiretap · Incident Management · compliance handling
  • Data loss prevention

Job Type: Full-time

Pay: ₹96,112.53 - ₹115,748.42 per year

Schedule:

  • Day shift
  • Monday to Friday

Experience:

  • total work: 4 years (Preferred)

Work Location: Remote

Application Deadline: 27/05/2024
Expected Start Date: 27/05/2024

Report job

# Finally, don't forget to close the connection!

con.close()

What Next?

The possibilites are endless!

Keep Coding
Keep Coding!