26 lines
1.3 KiB
Markdown
26 lines
1.3 KiB
Markdown
# eduMail Scraper
|
|
|
|
This repository contains Python tools I used to scrape school contact directories for students, alumni, staff, and professors. It also includes a fully anonymized version of the dataset (~112,000 contacts) that's safe to share, with all personally identifiable information (PII) like names, emails, phone numbers, and profile pictures removed.
|
|
|
|

|
|
|
|
## What's Inside
|
|
|
|
- **Python scripts** for scraping and processing contact data
|
|
- **Anonymized dataset (`out.csv`)**
|
|
|
|
## Dataset Columns
|
|
|
|
| Column Name | Description |
|
|
|-------------------|-------------|
|
|
| Name | Full name |
|
|
| Email Address | School email |
|
|
| Chat Address | Outlook/Teams chat handle (same as email address) |
|
|
| Mobile | Mobile phone number (formats may vary, such as xxx-xxx-xxxx, (xxx) xxx-xxxx, or xxxxxxxxxx) |
|
|
| Work Phone | Office or work phone number |
|
|
| Job Title | The person's role, such as "Professor," "Student," or "Administrator" |
|
|
| Department | The department, program, or field the person belongs to, like "Department of Computer Science" |
|
|
| Office Location | Office or building location, like LIB 101 |
|
|
| Company | Name of the organization, school, or employer |
|
|
| Profile Picture | Profile photo or avatar in base64 |
|
|
|