edumail-scraper/README.md
2026-03-26 23:15:56 +09:00

26 lines
1.3 KiB
Markdown

# eduMail Scraper
This repository contains Python tools I used to scrape school contact directories for students, alumni, staff, and professors. It also includes a fully anonymized version of the dataset (~112,000 contacts) that's safe to share, with all personally identifiable information (PII) like names, emails, phone numbers, and profile pictures removed.
![Preview of the anonymized school contacts dataset](docs/assets/img/preview.png)
## What's Inside
- **Python scripts** for scraping and processing contact data
- **Anonymized dataset (`out.csv`)**
## Dataset Columns
| Column Name | Description |
|-------------------|-------------|
| Name | Full name |
| Email Address | School email |
| Chat Address | Outlook/Teams chat handle (same as email address) |
| Mobile | Mobile phone number (formats may vary, such as xxx-xxx-xxxx, (xxx) xxx-xxxx, or xxxxxxxxxx) |
| Work Phone | Office or work phone number |
| Job Title | The person's role, such as "Professor," "Student," or "Administrator" |
| Department | The department, program, or field the person belongs to, like "Department of Computer Science" |
| Office Location | Office or building location, like LIB 101 |
| Company | Name of the organization, school, or employer |
| Profile Picture | Profile photo or avatar in base64 |