The Control-F command only found the multiple documents highlighted in green. Notice that the Control-F FIND box (pointed to by the red arrow) shows that there's only 1 instance found. As a consequence, it runs ALL of the text together like this:
#How to search a page for words on a mac pdf
The Trick: When you copy/paste from the PDF file into a SimpleText or MS Word document, the receiving document drops all of the formating information, including things like the new-line character. I opened that document, then selected all the text (by doing a CMD+A or Control+A for PCs), then copied and pasted it into a SimpleText document (an MS Word document would work as well).
![how to search a page for words on a mac how to search a page for words on a mac](https://cdn.allthings.how/wp-content/uploads/2020/03/allthings.how-how-to-search-for-a-word-on-mac-search-words-on-mac-759x427.png)
To make things as basic as possible, I downloade the full-text PDF from the site. What would be a more "basic" way to do this search? And what was I doing wrong? I tried to figure out how Aui could have found 7 instances of multiple documents. But Aui reports finding 7 hits! What's going on? Which found an already-scanned version of the paper at (a technical paper repository)! A Control-F search there finds. Indeed, you'll find the phrase 7 times (but only 5 of them are from this particular chapter, the other 2 are from other chapters in the book).īut Aui did an interesting thing by doing a search for: What? How's that possible? How could I have missed one?Īs Jon pointed out, you could search Google Books for the book that this chapter is in ( Handbook of Research on Reading Comprehension) and then do a search for "multiple documents" in that book. In the comments on this post, Jon, Aui, and Remmij found it 7 times. I opened the PDF in Acrobat, opened the "Recognize Text" tool on the right side (see below) and clicked "In This File" to run the OCR.Īnd so, that was that. I used the default settings to OCR the text. So, like Teri, I just used the OCR tools for Acrobat to convert. (Note that this is for Acrobat Pro, not Acrobat Reader-that just lets you read PDF files, not convert them.) But I ALSO learned that Adobe Acrobat has a conversion capability built into it. Time for another approach, one that will do more than 10 pages of OCR.Īnd I learned there are a number of online PDF OCR conversion tools. It should say something like "There's more text in your document, but we stopped the OCR after 10 pages."Īrgh. There should be a notice in the converted doc (in bold, red, flaming letters) that tells you this. Okay, so it's documented, but it's still a huge surprise. For PDF files, we only look at the first 10 pages when searching for text to extract." I went back to the Help Center for some explanation, and discovered that it very clearly says ". What's up with that?Īs I scrolled down looking for the "extra" instance I'd found, I discovered that the Google Docs version ended at page 10 (out of 21 pages in the original)-there were no references, and nothing past the mid-point of the paper! Gack. When there are strange boxes on the page, Docs OCR might skip over a chunk of the text.īut that didn't explain the "extra" instances of the phrase multiple documents I found in the printed-out version of the paper. Okay, I know that OCR is a difficult process many OCR systems have errors, and I just found one here in the Docs OCR. IF the OCR process was accurate, it certainly would have located the title of the paper (which is just a few lines below).
![how to search a page for words on a mac how to search a page for words on a mac](https://imag.malavida.com/mvimgbig/download-fs/hack-app-data-21825-5.jpg)
As you can see in the above image, you can't even Control-F for the title of the document: there are zero hits for the title. That's when I noticed that much of the first page of text had NOT been recognized! Huh.
![how to search a page for words on a mac how to search a page for words on a mac](https://kingpinbrowser.com/wp-content/uploads/search-webpage-safari.jpg)
![how to search a page for words on a mac how to search a page for words on a mac](https://photos5.appleinsider.com/gallery/27433-41002-007-Paste-and-Match-Style-l.jpg)
#How to search a page for words on a mac how to
Which led me to a lovely Help Center article about how to import a PDF file into your Google Drive, then open it with Docs. I also remembered that Google Docs had some OCR capability, so my first query was: So this Challenge is really about "tool finding" - can you figure out how to convert from a scanned document into a readable / findable / searchable one?Īs we've talked about before, taking a scanned document and converting the scan into recognizable text is called "Optical Character Recognition," or OCR, so I'm going to use that in my query. Once you've done that, can you determine how many times the authors refer to "multiple documents" in that paper? (This was my original search task-finding interesting papers about how people read multiple documents at the same reading session. How can you transform this document ( LINK) into something that you can search within? 2. Let's review: the SearchResearch Challenge for this week is meant to give you an additional powerful tool for importing scanned documents and making them findable.ġ. there are many ways to search in a scanned PDF for some text.