Code

main page

Here are a few scraps of code that I'm in the process of developing. I'm placing them all in the public domain, so use them for whatever purpose you wish, although I can't offer any guarantee that any piece of code will be suitable for a a particular purpose, or that they won't crash your computer each time you try to compile them. With that out of the way, here's some code...

code:

bytemark.cpp(3.0 kb)

Current Version:1.00.web1 (5.22.1999)

General Description:
Bytemark indexes large text files that contain alphabetically ordered lists. Output consists of unique two character combinations found at the beginning of lines in the input file, followed by the corresponding byte offset of the letter comination within the input file. The two data fields are delimited by a * character on the third column in each line. The program was written to speed searches through large text files, such as those used in the Moby lexicon project. If you use this code, you'll need to change the filenames to suit your needs. If you do use the code, and need some help working out what's going on, feel free to contact me and I'll try and help you out.

Status:
Functional, but see the Wish List.

Files:
bytemark.cpp: Source code
bytemarkMoby: (not included) My name for the bookmark file that is the final output of the program.
mobyrepdata: (not included) My name for a file generated by the program for internal use during indexing.
mobyposi: (not included) My name for the text input file.

Wish List:

Currently, indexing speed decreases exponentially as the process progresses, due to the fact that the last repeated character sequence is appended to the bottom of the file. This problem will be fixed soon, resulting in faster search speeds.
When I do the search optimization, I'll add code so that filenames can be input at the beginning, rather than hard-coded in the source. This program was originally intended to index only one huge file, so this option wasn't initially needed.
I'm debating whether to add some kind of a progress meter, because the program doesn't provide any output during the indexing process, which can be somewhat disconcerting.

strtype.h(3.0 kb)

Current Version:1.00 (5.22.1999)

General Description:
strtype.h is an enhanced version of the ANSI C++ ctype.h header file, permitting characters within string variables to be used with character type checking functions such as isalpha(), isdigit(), etc. Case conversion using toupper() and tolower() is also supported. Support for checking of char variables has been retained. To check a character within a string, call stringchar(string, int) where string represents the string that the character to be tested is contained in, and int is the subscript value of the character within the string. After this call, perform the check on the character by passing -1 to any of the standard ctype.h functions, e.g. isalpha(-1). The value returned by the function is identical to that returned by passing a conventional character to a function in ctype.h.

Status:
Conversion and check functions work well for both string and char variables. If you find any bugs, let me know.

Files:
strtype.h: Header file

Other notes:
This header assumes that you're using P.J. Plauger's implementation of the C++ libraries. Info on these libraries can be found here. Some preprocessor tweaks may be required, depending on what platform you're using the header file with (It was developed using Code Warrior 9 Gold on a Macintosh PPC machine).

application:

rpage 1.00a3.web1 Macintosh Version (75.0 kb)
code:

rpage.cpp (11.0 kb)
text:

wordlist (3.6 kb)

Current Version: 1.00a3.web1 (5.13.1999)

General Description:
rpage is an program that allows users of purely numeric pagers to receive short alpha-numeric messages in a manner more flexible than traditional pager codes. It uses a series of 10 5x5 matrices (called metalayers) to store words, symbols and phrases in. Columns (left to right) are designated 6,7,8,9,0 respectively, and rows (called sets) (top to bottom) are designated 1,2,3,4,5. Each message sent using the system begins with a 4 digit header. The first two digits designate a sender ID, which can be anything from 00 to 99, that you make up for yourself. The third digit designates the current page (called a packet) that is being sent. The fourth digit designates the total number of packets that the message contains. If a message is in excess of 9 packets, this digit is given a - value (usually displayed as a - on most pagers), indicating that there are more than 10 packets. After the header has been appended to the message, a metalayer needs to be designated if the page begins with a symbol in a metalayer other than 0, the default. A metalayer is designated by issuing the - digit to the system, followed directly by a number 0-9 that specifies the metalayer that you wish to use. After this has been done (or not), a starting set within the metalayer needs to be specified. When this is complete, an symbol can be selected from the set. When you select subsequent symbols, there is no need to respecify either the metalayer or the set, if the symbol, word or phrase that you need is in the same set. Also, if you change metalayers, and the item you need lies within the same set that you were using in the previous metalayer, then there is no need to respecify the set after changing metalayers.

Status:
So far the decoder has been implemented in this code, and I'm busy writing the encoder Some precursors to this module are already in this code release, in the form of some string sorting and management routines. So far, the decoder appears to be working well. I've left all of the debug code in the program, albeit commented out, so you don't have to reinvent the wheel if you'd like to test something.
This code release compiles successfully using Metrowerks Code Warrior IDE 1.6 for the Macintosh.

Files:
rpage1.00a3.web1: Precompiled Macintosh version of this program.
rpage.cpp: Source file.
wordlist: Default symbol/word/phrase list, loaded at runtime.

Wish list:

Complete the encoder module, finishing the core of the program.
Optimize the arrangement of the default symbol list a little more.
Provide the option for each sender ID to have an associated custom symbol file.
Add the capability for more items by using consecutive - characters to denote entirely different symbol tables.
Develop GUI interfaces for the Macintosh, Linux, SGI and Win platforms.
Provide the option to send the message using the modem.
Port to Z80 assembly, so I can decode messages using my TI-83.

Other notes:
If you look at the wordfile, you'll notice that the last 25 items (representing the last array) contain names and places that will in all likelihood be pretty useless to you. The last array is designated as a user definable array, so feel free to change it by changing the items in the wordfile to suit your purposes. The other arrays have been semi-optimized for what I believe are useful combinations of phrases for general messaging use. You're welcome to change them also if you wish, but do some planning first and work out which words or phrases are likely to be compounded together first, and place them in the same set or metalayer. The less you have to switch sets and metalayers, the more compact the message is.

code:

typonet.cpp (4.0 kb)

Current Version: 1.00b1.web1

General Description:
typonet is a typo generator. It uses a rudimentary weighted network based on the qwerty keyboard to produce errors in much the same way that a regular typist would. Given a string, it writes 100 variations of the string to a text file. Because typists aren't always inaccurate, some entries are identical to the string. On the other hand, some output from the program has compound typos. It's part of a natural language parser that I'm developing. When I'm done, that'll be up here also, but don't hold your breath for it, because I've only just started to write code for the project.

Status:
At the moment, everything seems to be working ok. The program generates typos along both rows and columns near the target letter. I'd really like to add a feature for doing letter inversion (wtih - with), and add in an added / dropped characters feature (wiuth - with) (wth - with). Also, I need to rearrange the qwerty array a little so that unlikely combinations of upper and lowercase characters aren't produced.

Files:
typonet.cpp: Source file.

Wish list:

I need to redo some of the randomization code in the program, as there's a much better way to implement it using the modulus operator.
Rules for better typos need to be added.
The user should be given the option to either write to a file, output to the screen, or do both.