Do you want to know how often words occur in a pdf file? And sort them by the most occurring word:

Let’s break it down step by step:

displays the pdf content on the command-line


replaces all control characters (cntrl), all numbers (digits) and all punctuation characters (punct) with an empty string.
See here for character classes.

replaces all spaces with a newline


The last part sorts the output, groups unique lines and prefix them with the amount and finally sort them again
with ignored leading blanks (-b), sort numeric (-n), in reverse order (-r)

First find all the files you want to convert and store their filenames in a file

Iterate through that files and make a jpeg from them

Finally move all jpgs to a separate location if necessary