Do you want to know how often words occur in a pdf file? And sort them by the most occurring word:

Let’s break it down step by step:

displays the pdf content on the command-line

 

replaces all control characters (cntrl), all numbers (digits) and all punctuation characters (punct) with an empty string.
See here for character classes.

replaces all spaces with a newline

 

The last part sorts the output, groups unique lines and prefix them with the amount and finally sort them again
with ignored leading blanks (-b), sort numeric (-n), in reverse order (-r)

Veröffentlicht unter pdf.