Extracts text from PDFs using Apache PDFBox and analyzes word frequency with customizable filters
Find a file
2026-03-24 23:46:08 -07:00
docs/assets/img Archive 2026-03-24 23:46:08 -07:00
nbproject Archive 2014-10-26 19:15:38 -07:00
out/wordanalyzer Archive 2026-03-24 23:46:08 -07:00
src/wordanalyzer Archive 2014-10-26 19:15:38 -07:00
build.xml Archive 2014-10-26 19:15:38 -07:00
manifest.mf Archive 2014-10-26 19:15:38 -07:00
pdfbox-1.8.16.jar Archive 2014-10-26 19:15:38 -07:00
README.md Archive 2026-03-24 23:46:08 -07:00
WordAnalyzer.jar Archive 2026-03-24 23:46:08 -07:00

Word Analyzer

Extracts text from PDFs using Apache PDFBox and analyzes word frequency with customizable filters.

Screenshot of program

Features

  • Scans all PDF files in a given folder
  • Counts and displays word frequency
  • Filters results by minimum and maximum frequency
  • Optional maximum file count limit
  • Shows scan logs and results in separate windows
  • Displays total scan time

Requirements

  • Java JRE 8 or higher
  • Apache PDFBox 1.8.16 (bundled, no download needed)

Usage

java -cp WordAnalyzer.jar:pdfbox-1_8_16.jar wordanalyzer.WordAnalyzer
  1. Enter the folder path containing your PDF files
  2. Set the minimum/maximum frequency filter
  3. Optionally set a maximum file count
  4. Click Confirm Folder Path to start the scan
  5. Results will appear in the results window when the scan is complete

Building from Source

mkdir out
javac -cp pdfbox-1_8_16.jar -d out src/wordanalyzer/WordAnalyzer.java
jar cfe WordAnalyzer.jar wordanalyzer.WordAnalyzer -C out .