42 lines
1.1 KiB
Markdown
42 lines
1.1 KiB
Markdown
# Word Analyzer
|
|
|
|
Extracts text from PDFs using Apache PDFBox and analyzes word frequency with customizable filters.
|
|
|
|
<p align="center">
|
|
<img src="docs/assets/img/preview.jpg" width="75%" alt="Screenshot of program"/>
|
|
</p>
|
|
|
|
## Features
|
|
|
|
- Scans all PDF files in a given folder
|
|
- Counts and displays word frequency
|
|
- Filters results by minimum and maximum frequency
|
|
- Optional maximum file count limit
|
|
- Shows scan logs and results in separate windows
|
|
- Displays total scan time
|
|
|
|
## Requirements
|
|
|
|
- Java JRE 8 or higher
|
|
- Apache PDFBox 1.8.16 (bundled, no download needed)
|
|
|
|
## Usage
|
|
|
|
```bash
|
|
java -cp WordAnalyzer.jar:pdfbox-1_8_16.jar wordanalyzer.WordAnalyzer
|
|
```
|
|
|
|
1. Enter the folder path containing your PDF files
|
|
2. Set the minimum/maximum frequency filter
|
|
3. Optionally set a maximum file count
|
|
4. Click **Confirm Folder Path** to start the scan
|
|
5. Results will appear in the results window when the scan is complete
|
|
|
|
## Building from Source
|
|
|
|
```bash
|
|
mkdir out
|
|
javac -cp pdfbox-1_8_16.jar -d out src/wordanalyzer/WordAnalyzer.java
|
|
jar cfe WordAnalyzer.jar wordanalyzer.WordAnalyzer -C out .
|
|
```
|
|
|