Python tools for scraping school contact directories, with an anonymized dataset of 112k+ records (all PII removed)
| docs/assets/img | ||
| .gitignore | ||
| __init__.py | ||
| browser.py | ||
| conf.json | ||
| conf.py | ||
| contacts.py | ||
| out.csv | ||
| README.md | ||
| redacted.py | ||
eduMail Scraper
This repository contains Python tools I used to scrape school contact directories for students, alumni, staff, and professors. It also includes a fully anonymized version of the dataset (~112,000 contacts) that's safe to share, with all personally identifiable information (PII) like names, emails, phone numbers, and profile pictures removed.
What's Inside
- Python scripts for scraping and processing contact data
- Anonymized dataset (
out.csv)
Dataset Columns
| Column Name | Description |
|---|---|
| Name | Full name |
| Email Address | School email |
| Chat Address | Outlook/Teams chat handle (same as email address) |
| Mobile | Mobile phone number (formats may vary, such as xxx-xxx-xxxx, (xxx) xxx-xxxx, or xxxxxxxxxx) |
| Work Phone | Office or work phone number |
| Job Title | The person's role, such as "Professor," "Student," or "Administrator" |
| Department | The department, program, or field the person belongs to, like "Department of Computer Science" |
| Office Location | Office or building location, like LIB 101 |
| Company | Name of the organization, school, or employer |
| Profile Picture | Profile photo or avatar in base64 |
