Skip to content

A program to count the number of words from a PDF file and save the results (word weight) in a CSV file.

License

Notifications You must be signed in to change notification settings

mohammad26845/Word_counter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word counter PDF files

pyversion commits-since release last-commit


A program for counting the number of words(word tokenize) in PDF files.

It should be noted that this program does not detect scanned files.

How to run

To run this file; Just use steps below:

  • Install python3, pip, PyPDF2, nltk.
  • Clone the project Word_counter

Table Of Contents

Tip

NLTK libraries are required.

If you want to install them on your system You must run the following code:

import nltk
nltk.download('stopwords')
nltk.download('punkt')

Parameters

You must modify the filename variable to rename the input file:

filename = 'Your_file.pdf'

To change the number of output words, you must modify the variable count_word:

count_word = 30

TODO List

  • Create a CSV file
  • Create a Wordclouds

About

A program to count the number of words from a PDF file and save the results (word weight) in a CSV file.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages