Most of you can use the FIND function to locate files on your hard drive. Some of you even know how to use it to find files with something specific in them.
But did you know you can use it to find files with non-printable control codes in them, also? (Well... usually)
Just to be clear, I am NOT talking about the FIND in your favorite word processor. I am talking about the one on the START menu.
What follows is a actual question from Experts Round Table. While several other workable solutions were offered, this has been edited to focus on the answer that was specified in the question.
<note: the responses have been edited to run in the time allotted, and the names have been changed, to protect the guilty.
The Question
Without resorting to any freeware or shareware; is there any way to search the entire system for a specific hexadecimal character for which there is no exact ASCII text counterpart?
The character that I want to search for is Hex 1A
It displays as ^Z in ASCII text format, but if I start with ^Z in ASCII, that is two characters and does not convert to Hex 1A.
Reason for wanting this feature:
The character is created as an End of File (E0F) delimiter by my old Word Perfect program, and screws up indexing by my old version of Eudora. I can search for and eliminated it with my ancient version of a text editor that nobody here ever heard of.
However, to eliminate it I must have a fair idea about which file contains it. I would like to use the Windows FIND command to look for it across the entire system while I mow the lawn, shovel snow, or go for breakfast.
The Search for an Answer
Mentor1:
This is an interesting question. I have some ideas I'll try out and get back to you. What version of windows do you have? (I seem to recall you said 98 before).Questioner:
Windows 98 SEMentor2:
There was an old program called "repl" (I think) which could do this. You could give it binary data to search/replace/remove.Unfortunately it only worked on files up to 64K in size as it was REAL DOS and was written with a DOS compiler.
Something to watch out for though is that you do NOT want to replace ALL instances of chr(26) in all files. Whilst chr(26) is EOT in WordPerfect files, in a binary file (say .COM, .EXE, .DLL, etc) it can be meaningful data.
Doing something like this in PHP is ultra easy.
If you want to have a play with PHP, then I can provide a script which will do just what you want. Installing PHP for command line usage is REALLY easy.
Mentor1:
I was thinking of writing something in PHP or perl, too...in fact, you can compile either as standalone exe's, so he wouldn't even have to install them...Mentor2:
I've seen a Delphi project called WinBinder which allows you to interact with PHP from Delphi.Questioner:
I would only need to search files with an extension of .TXT or .MBX and would not need a replace feature. My other software would take care of that. Searching for 1A works fine in hexadecimal mode, but there doesn't seem to be a corresponding ASCII character to search for.The Windows FIND command will not search in hexadecimal mode. At least there is no option to do that.
What would be nice, and might have some value in the after market would be a brand new FIND command that could replace the one provided with Windows.
The sticky part appears to be the requirement that the command search the entire system, including sub-folders.
As you know, the Windows FIND command has options that allow searching in many different file formats. Dumb me! I didn't even know about this feature until I started digging into this thing. Anyway, I would surmise that one of those other formats would allow searches in hexadecimal automatically. If that were the case, then to fool the system one might change the file extension temporarily. That would be a pain, and would be risky.
Enter our Hero
Mentor3:
Anyway you can copy/paste the 'eof' from one of the files, into the 'Containing text' box, with wildcards ( *.* ) in the 'Named' field? Be sure to check the 'look in subfolders' box.Maybe you can search just one file to see if it works.
Mentor1:
Anyway you can copy/paste the 'eof' from one of the files,
I think the problem with that, is that 1A = chr(26), is the ASCII 'SUB' character - a control character that does not print anything you can copy (see e.g. www.asciitable.com). That being said, it *is* in fact valid ASCII. If that doesn't work, I will try to write a little PHP or C++ utility to do this in the next few days.
Mentor2:
Characters with an ASCII value of less than 32 have no direct physical representation. Most have a name (LF, CR, TAB, BELL, EOT, EOF, etc). These are often represented as ^x where x is the character matching from A to Z.Maybe this table will explain better.
http://www.lookuptables.com
In DOS/Windows, a common way to get these characters is to use the CTRL key + the letter.
e.g.
CTRL+M = CR
CTRL+Z = EOF
Unfortunately, as you have discovered, control characters are not accepted within the FIND command.
A valid alternative to the FIND would be GREP (http://en.wikipedia.org/wiki/Grep)
This program is commonly available with most development languages (Borland C/Delphi, MS Visual Studio, etc).
This program is also available within the Cygwin environment.
The major advantage is that you can search for hexadecimal values the from the command line.
Each implementation of grep that I have seen supports recursive directory handling.
Questioner:
Success! The idea described by Mentor3 works!Somebody smarter than me can figure out why. In this ASCII-EBCDIC reference:
http://www.natural-innovations.com/computing/asciiebcdic.html
The exact symbol does not show. And, it is invisible with DOS EDIT and my old version of Word Perfect. But it clearly shows with NOTEPAD, as a vertical bar.
Using Cut and Paste from there into the FIND command works. From there to actually fixing the problem gets messy when the file is inside a folder in Eudora, partially because my old utilities won't handle long file names.
I was forced to copy the file back to the IN.MBX or the OUT.MBX to work on it. A temporary MBX would have done the trick, also.
Before going back into Eudora it is better to delete the associated TOC file. Most likely any new utility would not do that and any mass correction would get a little messy at that point. However, Eudora is somewhat forgiving and would provide a warning that the old TOC was no longer valid.
Thanks, guys!
Wrap up.
Questioner:
In the final cleanup after finding the corrupt files, it turned out to be easy to spot and delete the EOF character with NOTEPAD (or WORDPAD), even inside folders with long file names. Therefore, it was not necessary to move any such files to a work area.In NOTEPAD the character shows up as a solid lozenge.
In WORDPAD the character shows up as an open lozenge.
Depending on Eudora indexing, the EOF character was often in the middle of the associated sequential .MBX file.
To summarize the procedure:
- Create a dummy text file with only the EOF character, using either Word Perfect or any edit program that allows creation of the hexadecimal 1A character.
(The edit program that I use is ancient. It is the VIEW/EDIT program from the old 1dirPLUS DOS Shell. The company that created it went out of business years ago. Most likely it would not work on anything later than Windows 98, because of the partition size changes. Downward compatibility eventually goes down the tube.) - Access the dummy text file with (START) RUN NOTEPAD.
- Use the mouse to COPY and PASTE the character into the data line of the FIND command.
- Search for either *.MBX or *.TXT as needed.
- Click on one of the resulting directory lines to bring up NOTEPAD. If the corrupt file is too large for NOTEPAD, WORDPAD will come up automatically.
The only hitch was where the corrupt file had the READ-ONLY attribute set. That forced a detour.
For *.MBX files, the associated *.TOC file for Eudora was now corrupt, but this is not a problem, because Eudora will provide a prompt and offer to rebuild the Table of Contents (TOC). The corrected TOC file will be in vanilla form with only a ? flag, but this is not really a problem If a user wants to get rid of it, just delete that TOC file outside of Eudora and come back in. The flags are lost, but so what?
Mentor3:
Cool. Glad it worked for you.Somebody smarter than me can figure out why
It does not always work, but I have saved my butt several times using it, and made a couple of people think I am a genius. Shhhhh...
More background information
Mentor1:
This is really interesting actually -- I've seen similar behavior where HTML entities corresponding to unprintable characters show up as a little box or whatever because the browser doesn't know how else to handle them. I never figured the underlying code was still there, but why not?In Windows, the character you see for an unknown letter depends upon the font being used.
BUT, if the character is a control character (those with an ascii value less than 32), there is not visual representation. These characters are NOT for displaying to a user. They are for controlling a device displaying the visible text.
They are also used in serial based communications (where all this ASCII stuff started).
For example CR - chr(13) and LF - chr(10).
At the end of a line, CR (Carriage Return) was needed to move the print head to position 0. The LF (Line Feed) was required to move the paper up 1 line.
If the file only has 1 of the characters, (CR or LF), then Notepad will display a character (depending upon the font used) for the control character. This can be a small black blob, a vertical line or even a hollow square. Different fonts will have a different "picture" for the unknown character. From what I know, the ascii value of the character is 255. Which is NOT the initial value of the control character as, as I said earlier, control characters have no visual representation - they are commands.
Finally, in DOS, all files edited with Edit will normally have the EOT character at the end of the file. The editor will not show this when you are editing the file. Notepad will show its unknown character symbol. Other applications will behave differently. Now Edit within Windows doesn't seem to add the EOT character. You can see the control characters if you use edit in binary mode by typing
edit /078 file.ext
Edit has limits on file sizes, so don't do it with BIG files. The 078 allows the entire screen to be used to display the file. Why 78? Lines are 80 wide. There is a "window" edge on the left and the scroll bar on the right, taking you to 78.
You CAN do a search for the EOF (chr(26) - CTRL+Z) by pressing CTRL and Z in the Find box.
On my system I see a right arrow. You may see something different. The symbols are dependent upon different bitmap fonts used to for DOS.
Ok. Earlier I said there was no visual representation. There is. Sort of. But not necessarily consistent across all OS's.
I see the CR as a musical note. I see the LF as a white box with a black circle in it. I see EOF as a right handed arrow.
Hope this all helps.
Someone could write this up quick as a mini tutorial (maybe with a link to a free HEX editor).
That's a trick I bet a lot of people would find handy, yet never suspect of working (I know I didn't expect it to work!).
Links
The complete Thread. With additional info.
Windows Find Command
Hex/Disk Editors
Disk Investigator (freeware):
http://www.theabsolute.net/sware/dskinv.html
Winhex - byte level editor
www.winhex.com
or:
http://www.x-ways.net/winhex/forensics.html
Download link on there for Eval version
And I know UBCD has a couple of them on it.
Ultimate Bootdisk and diag utils:
http://ultimatebootcd.com
And you can always just dig around on your favorite Freeware site (which is fun to do anyway), or Google for hex editor or freeware hex editor
NS,NR!!

