word-analyzer/README.md
2026-03-24 23:46:08 -07:00

1.1 KiB

Word Analyzer

Extracts text from PDFs using Apache PDFBox and analyzes word frequency with customizable filters.

Screenshot of program

Features

  • Scans all PDF files in a given folder
  • Counts and displays word frequency
  • Filters results by minimum and maximum frequency
  • Optional maximum file count limit
  • Shows scan logs and results in separate windows
  • Displays total scan time

Requirements

  • Java JRE 8 or higher
  • Apache PDFBox 1.8.16 (bundled, no download needed)

Usage

java -cp WordAnalyzer.jar:pdfbox-1_8_16.jar wordanalyzer.WordAnalyzer
  1. Enter the folder path containing your PDF files
  2. Set the minimum/maximum frequency filter
  3. Optionally set a maximum file count
  4. Click Confirm Folder Path to start the scan
  5. Results will appear in the results window when the scan is complete

Building from Source

mkdir out
javac -cp pdfbox-1_8_16.jar -d out src/wordanalyzer/WordAnalyzer.java
jar cfe WordAnalyzer.jar wordanalyzer.WordAnalyzer -C out .