juyung/word-analyzer

Extracts text from PDFs using Apache PDFBox and analyzes word frequency with customizable filters

apache-pdfbox file-processing text-analysis

Find a file

juyung 5b85b9cfb2 Archive		2026-03-24 23:46:08 -07:00
docs/assets/img	Archive	2026-03-24 23:46:08 -07:00
nbproject	Archive	2014-10-26 19:15:38 -07:00
out/wordanalyzer	Archive	2026-03-24 23:46:08 -07:00
src/wordanalyzer	Archive	2014-10-26 19:15:38 -07:00
build.xml	Archive	2014-10-26 19:15:38 -07:00
manifest.mf	Archive	2014-10-26 19:15:38 -07:00
pdfbox-1.8.16.jar	Archive	2014-10-26 19:15:38 -07:00
README.md	Archive	2026-03-24 23:46:08 -07:00
WordAnalyzer.jar	Archive	2026-03-24 23:46:08 -07:00

README.md

Word Analyzer

Extracts text from PDFs using Apache PDFBox and analyzes word frequency with customizable filters.

Screenshot of program

Features

Scans all PDF files in a given folder
Counts and displays word frequency
Filters results by minimum and maximum frequency
Optional maximum file count limit
Shows scan logs and results in separate windows
Displays total scan time

Requirements

Java JRE 8 or higher
Apache PDFBox 1.8.16 (bundled, no download needed)

Usage

java -cp WordAnalyzer.jar:pdfbox-1_8_16.jar wordanalyzer.WordAnalyzer

Enter the folder path containing your PDF files
Set the minimum/maximum frequency filter
Optionally set a maximum file count
Click Confirm Folder Path to start the scan
Results will appear in the results window when the scan is complete

Building from Source

mkdir out
javac -cp pdfbox-1_8_16.jar -d out src/wordanalyzer/WordAnalyzer.java
jar cfe WordAnalyzer.jar wordanalyzer.WordAnalyzer -C out .