Here are a few scraps of code that I'm in the process of developing. I'm placing them
all in the public domain, so use them for whatever purpose you wish, although I can't
offer any guarantee that any piece of code will be suitable for a a particular purpose,
or that they won't crash your computer each time you try to compile them. With that out
of the way, here's some code...
Current Version:1.00.web1 (5.22.1999)
Bytemark indexes large text files that contain alphabetically ordered
lists. Output consists of unique two character combinations found at the
beginning of lines in the input file, followed by the corresponding byte
offset of the letter comination within the input file. The two data
fields are delimited by a * character on the third column in each line.
The program was written to speed searches through large text files, such
as those used in the Moby
lexicon project. If you use this code, you'll need to change the
filenames to suit your needs. If you do use the code, and need some help
working out what's going on, feel free to contact me and I'll try and help
Functional, but see the Wish List.
bytemark.cpp: Source code
bytemarkMoby: (not included) My name for the bookmark file that is
the final output of the program.
mobyrepdata: (not included) My name for a file generated by the
program for internal use during indexing.
mobyposi: (not included) My name for the text input file.
- Currently, indexing speed decreases exponentially as the process
progresses, due to the fact that the last repeated character sequence is
appended to the bottom of the file. This problem will be fixed soon,
resulting in faster search speeds.
- When I do the search optimization, I'll add code so that filenames
can be input at the beginning, rather than hard-coded in the source. This
program was originally intended to index only one huge file, so this
option wasn't initially needed.
- I'm debating whether to add some kind of a progress meter, because
the program doesn't provide any output during the indexing process, which
can be somewhat disconcerting.
Current Version:1.00 (5.22.1999)
strtype.h is an enhanced version of the ANSI C++ ctype.h header file,
permitting characters within string variables to be used with character
type checking functions such as
isdigit(), etc. Case conversion using
tolower() is also supported. Support for checking of
char variables has been retained. To check a character within a string,
stringchar(string, int) where string represents the
string that the character to be tested is contained in, and int is the
subscript value of the character within the string. After this call,
perform the check on the character by passing -1 to any of the standard
ctype.h functions, e.g.
isalpha(-1). The value returned by
the function is identical to that returned by passing a conventional
character to a function in ctype.h.
Conversion and check functions work well for both string and char
variables. If you find any bugs, let me know.
strtype.h: Header file
This header assumes that you're using P.J. Plauger's implementation of the
C++ libraries. Info on these libraries can be found here. Some
preprocessor tweaks may be required, depending on what platform you're
using the header file with (It was developed using Code Warrior 9 Gold on
a Macintosh PPC machine).
rpage 1.00a3.web1 Macintosh Version
rpage.cpp (11.0 kb)
wordlist (3.6 kb)
Current Version: 1.00a3.web1 (5.13.1999)
rpage is an program that allows users of purely numeric pagers to receive short alpha-numeric
messages in a manner more flexible than traditional pager codes. It uses a series of 10 5x5 matrices
(called metalayers) to store words, symbols and phrases in. Columns (left to right) are designated 6,7,8,9,0
respectively, and rows (called sets) (top to bottom) are designated 1,2,3,4,5. Each message sent using the system
begins with a 4 digit header. The first two digits designate a sender ID, which can be anything from
00 to 99, that you make up for yourself. The third digit designates the current page (called a packet)
that is being sent. The fourth digit designates the total number of packets that the message contains.
If a message is in excess of 9 packets, this digit is given a - value (usually displayed as a -
on most pagers), indicating that there are more than 10 packets. After the header has been appended
to the message, a metalayer needs to be designated if the page begins with a symbol in a metalayer
other than 0, the default.
A metalayer is designated by issuing the - digit to the system, followed directly by a number 0-9
that specifies the metalayer that you wish to use.
After this has been done (or not), a starting set within the metalayer
needs to be specified. When this is complete, an symbol can be selected from the set. When you
select subsequent symbols, there is no need to respecify either the metalayer or the set, if the
symbol, word or phrase that you need is in the same set. Also, if you change metalayers, and the
item you need lies within the same set that you were using in the previous metalayer, then there
is no need to respecify the set after changing metalayers.
So far the decoder has been implemented in this code, and I'm busy writing the encoder
Some precursors to this module are already in this code release, in the form of some string
sorting and management routines. So far, the decoder appears to be working well.
I've left all of the debug code in the program, albeit commented out, so you don't have to
reinvent the wheel if you'd like to test something.
This code release compiles successfully using Metrowerks Code Warrior IDE 1.6 for the Macintosh.
rpage1.00a3.web1: Precompiled Macintosh version of this
rpage.cpp: Source file.
wordlist: Default symbol/word/phrase list, loaded at runtime.
- Complete the encoder module, finishing the core of the program.
- Optimize the arrangement of the default symbol list a little more.
- Provide the option for each sender ID to have an associated custom symbol file.
- Add the capability for more items by using consecutive - characters to denote entirely different
- Develop GUI interfaces for the Macintosh, Linux, SGI and Win platforms.
- Provide the option to send the message using the modem.
- Port to Z80 assembly, so I can decode messages using my TI-83.
If you look at the wordfile, you'll notice that the last 25 items (representing the last array)
contain names and places that will in all likelihood be pretty useless to you. The last array
is designated as a user definable array, so feel free to change it by changing the items in the
wordfile to suit your purposes. The other arrays have been semi-optimized for what I believe are
useful combinations of phrases for general messaging use. You're welcome to change them also if
you wish, but do some planning first and work out which words or phrases are likely to be compounded
together first, and place them in the same set or metalayer. The less you have to switch sets and
metalayers, the more compact the message is.
typonet.cpp (4.0 kb)
Current Version: 1.00b1.web1
typonet is a typo generator. It uses a rudimentary weighted network based
on the qwerty keyboard to produce errors in much the same way that a
regular typist would. Given a string, it writes 100 variations of the
string to a text file. Because typists aren't always inaccurate, some
entries are identical to the string. On the other hand, some output from
the program has compound typos. It's part of a natural language parser
that I'm developing. When I'm done, that'll be up here also, but don't
hold your breath for it, because I've only just started to write code for
At the moment, everything seems to be working ok. The program generates
typos along both rows and columns near the target letter. I'd really like
to add a feature for doing letter inversion (wtih - with), and add in an
added / dropped characters feature (wiuth - with) (wth - with). Also, I
need to rearrange the qwerty array a little so that unlikely combinations
of upper and lowercase characters aren't produced.
typonet.cpp: Source file.
- I need to redo some of the randomization code in the program, as
there's a much better way to implement it using the modulus operator.
- Rules for better typos need to be added.
- The user should be given the option to either write to a file,
output to the screen, or do both.
I'm working on a reverse version of this program that performs word
resolution, using methods similar to the ones in this program that
More programs here soon! :-)
about me | projects
music | trashing
| "art" | links | contact | work | ideas |
code | changes