On Facebook, I recently posted the following:

Inspiration of the moment: I hate voicemail. If I don’t have an opportunity to chat right away, a voicemail is every bit as inconvenient. So my idea is this: Record a message whose first five seconds are “please email or leave a text instead of voicemail” and whose remaining 25 seconds are an unlistenable cacophony of screeching, blaring noise. That would at least keep it down to machines and the ultra-dedicated.

This got a few Likes, so I decided to start working on something. Attached to this post is a tame first draft of the voicemail message. It doesn’t include the 25-second ear horror (which I’d like to make later to test any skills I may have as a cacophonist), but does have a creepy voice[1] telling human listeners to leave a text or send an e-mail instead and to “please hang up”, plus error tones[2] to confuse some machines and repetition to discourage humans from waiting for the actual record tone.

Use at your leisure (and, of course, your own risk) and do share stories of any good (or bad) results.

Download

  1. [1] “Tom”, a text-to-speech voice whose unsettlingness partly comes from its use in NOAA weather radio and some areas’ Emergency Alert System notices.
  2. [2] Specifically, Special Information Tones (SIT) such as those used by TeleZapper that indicate to some kinds of auto-call machinery that the line has been disconnected.

(ANTLR users searching for an answer, you might skip the prose and head to the listings. They’re meant to be readable outside full context.)

What follows is a toy grammar, a gross oversimplification of one I’ve been working on, used to explore a fairly simple but rather underdocumented possibility in ANTLR: Having a single lexer rule emit multiple tokens (or zero tokens, or an altogether different token than specified by the rule).

Synopsis

This language is called “SomeSx”, and it consists of dumbed-down S-expressions with a special “sigil” modifier, as follows:

  • A document consists of zero or more elements delimited by whitespace.
  • An element is an atom, a list, or a sigiled item.
  • An atom is an unquoted string of letters and digits.
  • A list is zero or more elements delimited by whitespace, surrounded by ( and ).
  • A sigiled item is a $that is either
    • followed directly by an element (with no whitespace between), which is then its topic, or
    • not followed directly by an element, in which case it has a special empty topic.

One might think of a sigiled item as one that represents either a nullary (if the topic is empty) or unary (otherwise) function. The source

alpha $bravo $ (charlie $() delta $(echo foxtrot) golf $) $

might transliterate, as an example, to the following JavaScript-like expression:

top(
	"alpha",
	$("bravo"),
	$(),
	[
		"charlie",
		$([]), // Note: not the same as empty topic
		"delta",
		$(["echo", "foxtrot"]),
		"golf",
		$()
	],
	$()
)

Parser

The parser grammar is straightforward enough. (I nerfed the ever-loving stuffing out of it to make it that way.)

// Parser

start : elementsOpt ;

elementsOpt : element* ;

element : atom | list | sigiled ;

atom : Atom ;

list : Down elementsOpt Up ;

sigiled : Sigil sigilTopic ;

sigilTopic : element | emptyTopic ;

emptyTopic : EmptyMarker ;

Do note how sigiled is matched: There’s always a Sigil token and then a topic. That topic is either a normal element or some special EmptyMarker to indicate that there is no topic.

This is the problem, of course: In the language described above, the fact that a topic is empty doesn’t correspond to a token but to what regexp fiends would call a zero-width assertion. Before we get to why that’s a problem, I’ll sidestep it for a moment by presenting a lexer for a slightly broken version of the language.

Lexer, makeshift/broken demo version

Imagine that the language described above, instead of determining an empty topic by detecting that whatever follows a sigil is not an element, just has an explicit empty topic token ~. The above input sample would then be written

alpha $bravo $~ (charlie $() delta $(echo foxtrot) golf $~) $~

which is easy to write a lexer for:

// Lexer, with explicit empty token marker

Space :
	( ' '
	| '\t'
	| '\r'
	| '\n'
	) {$channel=HIDDEN;}
	;

fragment Dollar : '$' ;

Down : '(' ;
Up : ')' ;
Atom : ('a'..'z'|'A'..'Z'|'0'..'9'|'_')+ ;

EmptyMarker : '~' ;

Sigil : Dollar ;

This works delightfully; alas, it doesn’t lex the language as specified. No, for that to happen, EmptyMarker would have to match not ~ but the empty string (a big no-no for any eventually deterministic lexer), and to only match it in well defined places.

One way to make a lexer rule in ANTLR that matches in exactly this way is to make it a fragment; we could use a syntactic predicate to cause the zero-width token to match only if whatever follows the sigil doesn’t look like the start of an element. It’s actually very easy.

Or not. The rule’s fragment-ness is a double-edged sword in this regard: Matching it does not emit any sort of token that can be used by the parser. Therefore, we’ll have to convince the lexer rule containing the fragment to emit it alongside any token it might have produced anyway.

And it can be done. There’s just some plumbing to do first.

Lexer support for arbitrary numbers of tokens per rule

Little aside. (And again, do skip it if you’re in a hurry. It’s hypocritical; I’m providing unnecessary information into an article I wrote out of frustration with information overload.)

Emitting arbitrary numbers of tokens from a lexer rule is actually a well-known problem in the ANTLR world, if not only because Python is so popular. In that language, the braces one would see in a C-like language are eschewed in favor of indentation changes. While that’s nice for the user (ymmv), it’s tricky for the lexer, which has to be able to spit out all sorts of made-up “indent” and “dedent” tokens resulting from monitoring the indentation state.

After running into much information overload on the subject, I flipped open my dead wood copy of The Definitive ANTLR Reference and found this gem around pages 110-111:

Note that, for efficiency reasons, the CommonTokenStream class does not support multiple token emissions for the same invocation of nextToken(). Read the following from class Lexer when you try to implement multiple token emission:

When I read this, I truly had to fight the urge to throw the tome hard across the room. ANTLR’s finest documentation, lovingly crafted by the creator, is unfortunately (but justifiably) not free information. I have the thing itself in my hands, and it not only tells me that I’m going to have to override some methods to get it to work the way I want, but then has the nerve not to offer a concrete implementation and that I’m just going to have to “try to implement” it.

Additionally, a comment in the code snippet offers this:

Currently does not support multiple emits per nextToken invocation for efficiency reasons.

It might not need to be default, but I’d think this is a common enough thing that it might be simple (and more descriptive and easier to carry across target platforms) to have some settable flag for multiple emit support rather than requiring ugly ad hoc overrides.

Of course, this was at the end of a night of fruitless searching for an answer, and it’s a wonder nobody was killed.

Fortunately for humanity itself, I was finally able to extract the core nugget of knowledge from a wiki page that had quite a bit of unrelated (to me, anyway) surrounding information hiding it from plain sight.

Near the end of the text itself was the treasure: The overrides for emit(Token) and nextToken() that I needed. Sadly, it was for an earlier version of Java and of ANTLR, but I was able to port code from a more recent post by a user of the C# target.

Without further ado, the very simple guts of a multi-emitting lexer, for Java 1.6 and ANTLR 3.4:

// Without this code, or something similar, lexer rules emitting multiple
// tokens will not work correctly. (Try it if you don't believe me.)

@lexer::header {
	import java.util.Deque;
}
 
@lexer::members {
	// Substitute any compliant Deque implementation to taste
	Deque<Token> tokens = new java.util.ArrayDeque<Token>();
	
	@Override
	public void emit(Token token) {
		state.token = token;
		tokens.addLast(token);
	}

	@Override
	public Token nextToken() {
		super.nextToken();
		if (tokens.isEmpty())
			return Token.EOF_TOKEN;
		return tokens.removeFirst();
	}
}

Modifying the lexer to produce the marker token on cue

Here, we just follow through with the modification ideas from earlier: Have the Sigil token perform as before if, and only if, it detects that it is immediately followed by an element. If not, have it match a zero-width fragment token, then emit the Sigil followed by the EmptyMarker tokens. The details are provided inline:

// Lexer changes for implicit empty topic

// EmptyMarker is changed to match a zero-length string. In ANTLR (at least), a
// zero-width lexer rule is practically useless (it could match anywhere, and
// it's unclear where it's supposed to) unless it is a fragment. As a fragment,
// it matches exactly where we put it. But as a fragment, it doesn't emit a
// token...without some effort (read on).
fragment EmptyMarker : ;

// This fragment reflects the beginning of any explicit sigil topic: the first
// token in an atom, list, or sigiled. Used in a syntactic predicate, we can
// determine whether a token we're matching is followed by an explicit,
// non-empty topic.
fragment StartOfNonEmptyTopic : Down | Atom | Sigil ;

Sigil : d=Dollar
	// The Sigil token is changed to do what it did before as long as the token
	// appears immediately before what looks like a non-empty topic. 
	( (StartOfNonEmptyTopic)=> // already does the right thing; add nothing.
	
	// But if no such token follows, we have to output the sigil (as normal)
	// but then also insert a zero-width EmptyMarker afterward.
	| z=EmptyMarker // not followed by explicit topic; insert an empty one
		{
			// The tokens are fairly fluid; several of their properties can be
			// changed at will. In this case, the type of a token can be
			// changed while retaining its text content, line and column
			// numbers, and so forth.
			
			// From what I can tell, using emit(Token) directly will prevent
			// any token being emitted by the rule itself. So, if you use it
			// once for a fake token, be prepared to use it for the real one as
			// well.
			
			$d.setType(Sigil);
			emit($d);
			
			// You might have noticed that $z is already a EmptyMarker. Why set
			// the type? From looking at the generated lexer code, it appears
			// that a matched *fragment* rule creates a token with not the
			// named type but an unmatchable dummy token type. However, the
			// fragment token type (or any other token type) can be stuck back
			// onto it to make it work.
			$z.setType(EmptyMarker);
			emit($z);
		}
	)
	;

Peripheral caveats

Mostly not specific to this issue, but in general for ANTLR: Mind your generated code. Lots of stuff that might be wrong with your lexer will produce useless, misleading, or even absent diagnostic information. It’s not your fault, and fixing it might only require some changes in convention. For example:

  • I think I’m going to stick with CamelCase lexer rules, if only because ANTLR itself reserves a few all-caps ones such as DOWN and UP.
  • The aforementioned wiki page contains the phrase d=DIGITS r='..'; when I tried to match the dollar sign as a quoted string (i.e. d='$') the resulting variable in the generated lexer was an int rather than a Token. If you were curious why Dollar is its own fragment, this is the reason. (Perhaps double quotes would have worked—it wasn’t the first thing I tried, so I didn’t.)

Finally, I have to make sure to meditate and do my best not to be quick to anger toward Terence Parr et al. Language makes for really, really difficult computational problems, and despite the vast collection of warts in JARs, I do think ANTLR really is the best we’ve got, and I hate to consider the needless loss of life were I to attempt much of what I try to do with flex and bison.

The whole thing

Here is how everything fits together. Most comments have been removed to drive home how simple this whole thing is really supposed to be.

grammar SomeSx;

@lexer::header {
	import java.util.Deque;
}
 
@lexer::members {
	Deque<Token> tokens = new java.util.ArrayDeque<Token>();
	
	@Override
	public void emit(Token token) {
		state.token = token;
		tokens.addLast(token);
	}

	@Override
	public Token nextToken() {
		super.nextToken();
		if (tokens.isEmpty())
			return Token.EOF_TOKEN;
		return tokens.removeFirst();
	}
}


// Parser

start : elementsOpt ;

elementsOpt : element* ;

element : atom | list | sigiled ;

atom : Atom ;

list : Down elementsOpt Up ;

sigiled : Sigil sigilTopic ;

sigilTopic : element | emptyTopic ;

emptyTopic : EmptyMarker ;


// Lexer

Space :
	( ' '
	| '\t'
	| '\r'
	| '\n'
	) {$channel=HIDDEN;}
	;

fragment Dollar : '$' ;

Down : '(' ;
Up : ')' ;
Atom : ('a'..'z'|'A'..'Z'|'0'..'9'|'_')+ ;

fragment EmptyMarker : ;

fragment StartOfNonEmptyTopic : Down | Atom | Sigil ;

Sigil : d=Dollar
	( (StartOfNonEmptyTopic)=> // normal
	| z=EmptyMarker // insert empty token
		{
			$d.setType(Sigil);
			emit($d);
			
			$z.setType(EmptyMarker);
			emit($z);
		}
	)
	;

I think statically typed programming languages are great. Why? Because in just about every dynamically typed language I’ve ever used in earnest—JavaScript, for example—at some point I’ve had to write a lot of code to verify that a value looks like a certain type (one often can’t get a full guarantee) in order to make assertions about the theoretical correctness and consistency of a program. Is that a string? An array? A number? I can’t tell you until I’ve looked. For better or worse, JavaScript will allow absolutely anything (or nothing at all!) as a parameter. Sure, it’s a simple way to implement polymorphic functions without a whole lot of fuss…at first. But most functions I write are intended to work on a set of values that is much, much smaller than the set of values it is not designed to work on.

At a medium-to-high level, this is a non-issue with static typing. I specify a function to operate on values of specific types, and the program won’t even compile if someone tries to pass something different.

Apologists for dynamic languages have pointed to the verbosity and inflexibility of having to specify the type of every single thing, but advances such as type inferencedetermining the type of something based on how it is usedand type-safe genericsfunctions and objects that can operate on arguments of arbitrary types if those types fit general criteriahave made it easier for programmers to convey all of the intended meaning without sacrificing too much flexibility or writing too many more words than necessary.

Another important point to consider is the failure mode of a program that isn’t written correctly. Inconsistencies in the use of values and types tends to be a compile-time error for static languages; they are caught before the program is even run. In a dynamic language like JavaScript, no static types are enforced per se, but using a value as if it is something it’s not can result in a run-time error in the best case[1] and instead often “succeeds”, producing unexpected results [2].

Of course, I don’t like the bits of the static languages that continue to allow type-unsafe behavior. In my opinion, the major violators are:

  • Static casts, such as Java/C# String s = (String)o for some Object o
  • Instance-of as a boolean operator (more of an accomplice than a criminal)
  • Inheritance-based polymorphism used in place of tagged unions
  • Null being an allowed (and often non-disallowable) value for any reference type

Starting with the last one: It really seems strange to me that we automatically allow null to be passed as an object, even if it is an artifact of C (a reference being implemented as a pointer, a null corresponds to a NULL pointer). I think it’s a more reasonable assumption that the object must be valid unless we say it’s optional.

What does this have to do with types? Null degrades the utility of a type if it can’t be disallowed by allowing a value outside the value space to be assigned to a variable of the type. When you pull a value back out, you may need to verify that it isn’t null before continuing. This sounds like that type identification routine I didn’t want to write in JavaScript!

Scala very nearly has it right, offering up the Option monad from the functional programming world, which more or less accepts either a value from an existing type or the solitary None value. However, Scala also still has null for Java compatibility. D’oh! C# does it exactly right, but only for value (“struct”) types, with a ? suffix on nullable types. Objects are still always nullable.

Some work has been done to paper over this in Java, including several disparate @NotNull/@NonNull/… annotations. The first problem is that these aren’t enforced by standard Java but only by third-party static analyzers (so I hear). The second is that they’re not the default, and they never will be without some massive shift in mindshare.

Static casts are another holdover from C. In C, they are how pointers change from one type to another. This is essential to much actual activity in C, in part because C lacks generics.

Take for example qsort(). This is a function that behaves very much generically: It takes a void pointer to an array of objects of a size provided as another argument, then uses a pointer to a function that compares two such objects, passed as void pointers themselves, to determine a sort order. There is no such thing as a “void” value; a void pointer could point to anything at all. You might pass an array of int, or perhaps an array of const char *. As long as the function you pass later is in agreement, the sort works equally well.

// Actual compare functions
int compare_ints(int a, int b) {
	return (a < b) ? -1 : (a > b) ? 1 : 0;
}
int compare_doubles(double a, double b) {
	return (a < b) ? -1 : (a > b) ? 1 : 0;
}

// What you pass to qsort()
// Note the explicit pointer casts
int compare_ints_vp(void * a, void * b) {
	return compare_ints( *((int*)a), *((int*)b) );
}
int compare_doubles_vp(void * a, void * b) {
	return compare_doubles( *((double*)a), *((double*)b) );
}

But who checks whether or not it does agree? The answer is nobody. You could accidentally pass a comparison for int with a compare function for double. The compiler will not catch it, because either function has the same signature, accepting two void *. By using an explicit cast, as the compare functions have to, you’re relieving the compiler of virtually all responsibility of checking that conversion.

And what’s the failure mode? An actual runtime error in the best of cases (probably a segfault), but often just reported success paired with some hard-to-track-down erratic behavior. Seeing a pattern?

Thankfully, the JRE (Java) and the CLR (.NET) do care about types. The result isn’t always pretty, but it’s far more deterministic: If an object is cast to something that it isn’t, an error (exception) is guaranteed. But in Java, it isn’t a checked exception, and C# doesn’t do checked exceptions, so if a cast fails in this way, you can catch it, but the compiler won’t make you. And, of course, it often defers what should be a static type error to run time.

Like with C, much of the use of explicit casts in Java is to simulate generics; type-erasing generics were added a couple of major versions ago, but code from before that era is still alive and kicking. For example, before Java 1.5, java.util.List<T> was available only as java.util.List, which essentially always behaved as java.util.List<java.lang.Object> does now. If you had a List containing nothing but String objects, you couldn’t express it, so instead of someList.get(n), you’d do (String)someList.get(n), counting on convention and luck for any type safety.[3] The addition of generics dealt with the worst of this usage.

Java and similar languages eschewed the C union, citing that its legitimate usages[4] had been supplanted by inheritance-based polymorphism.

For example, say you have a heterogeneous List, in which each element is a String, a Number, or a Date. The types and orders are not known at compile time, but it is known that any of these three types can be elements without a problem. There is no direct expression for this in Java. The polymorphic approach would be to use a List with an element type of the nearest common ancestor, which is Object. This will work, but the code will need to incorporate explicit logic to determine that each member is allowed, because the compiler now allows any Object, not just the ones we want. D’oh again.

In the process of doing this checking, there’s something else that feels a little loose. In essence, there is a sequence that should always result in sane cast usage. First[5], you check the type membership—if it is this type, jump to the second step; otherwise, skip the rest[6]. Second, you cast the reference. Third, you execute code that uses the reference.

The most important part is that if the first step is a negative, you never try to cast the reference and you never attempt to use the cast reference.

Here’s what a bunch of these look like chained together, along with a null check:

void doSomethingWith(Object o) {
	// ... code that doesn't care about the object type ...
	if(o == null) {
		// Run-time check :-(
		throw new NullPointerException();
	}
	else if(o instanceof String) {
		String s = (String)o;
		// ... handle the string ...
	}
	else if(o instanceof Number) {
		Number n = (Number)o;
		// ... handle the number ...
	}
	else if(o instanceof Date) {
		Date d = (Date)d;
		// ... handle the date ...
	}
	else {
		// Run-time check :-(
		throw new IllegalArgumentException();
	}
	// ... code that doesn't care about the object type ...
}

It shouldn’t be this painful to write sound code. In this matter, Scala is the only one that does it right. The important thing to note is that Scala’s pattern matching combines all three of the steps into one; the same syntax that checks the type also casts it for you and runs contingent code.

def doSomethingWith(o: Any) {
	o match {
		case s : String => ... // handle the string
		case n : Number => ... // handle the number
		case d : Date => ... // handle the date
		case _ => throw new IllegalArgumentException()
	}
}

The only thing I think could be better about the above code would be the ability the omit the default case. (You can, but the failed match is a run-time exception.) Alas; Scala doesn’t do union types[7] and it still allows null.

I’ve drafted multiple language specs to help heal these wounds while remaining essentially C-esque or Scala-esque in syntax[8]. Hopefully I’ll succeed in implementing one of them at some point…

  1. [1] like trying to call a method on an object or non-object that doesn’t implement it
  2. [2] function f(x){return 100+x} results in either 123 or “10023” for f(23) and f(“23”); generally only one of these is the expected behavior
  3. [3] The Java compilers that support generics literally just do this for you, for the sake of backward compatibility. It’s kooky, but I’m fine with it as long as it’s the compiler’s job.
  4. [4] It was popular to use unions to destructure a value’s in-memory representation, for example, a union of a uint32_t superposed on an array of four bytes, but this usage is non-portable and generally contraindicated.
  5. [5] If the reference happens to be declared volatile, copy to a nonvolatile variable before proceeding.
  6. [6] and, optionally, run code that only happens if the type check fails
  7. [7] A similar effect can be had from case classes, but it involves surrogate objects.
  8. [8] Lisp is a nice idea, I think, but the syntax is a little abhorrent to non-theoreticians.

A rule known as the fundamental theorem of arithmetic states, in so many words, that any natural number is uniquely representable as a multiset of prime factors, and vice versa. Having two operands in this form makes some operations easier and other ones more difficult.

This discussion assumes certain definitions of terminology:

  • natural number : A positive integer; an integer 1 or greater.
  • prime number : A natural number with exactly two distinct natural factors, 1 and itself. In particular, 2 satisfies this requirement despite being even, while 1 does not satisfy this requirement because it has only one such factor instead of the required two.
  • multiset : A generalization of a set wherein the multiplicity of each item is significant.

The natural number indicated by a given multiset is the product of 1 and each element of the multiset. For example, 1 is represented by the empty multiset {∅}, 3 by {3}, 5 by {5}, 15 by {3,5}, 25 by {5,5}, and so on.

Multiset operations on the multisets correspond to other operations on the natural numbers they represent:

  • Intersection (i.e. retaining the lesser multiplicity per element) is equivalent to finding the greatest common divisor (gcd):
    • gcd(54, 24)
    • {2, 3, 3, 3} ⋂ {2, 2, 2, 3}
    • {2, 3}
    • 6
  • Union (i.e. retaining the greater multiplicity per element) is equivalent to finding the least common multiple (lcm):
    • lcm(8, 12)
    • {2, 2, 2} ∪ {2, 2, 3}
    • {2, 2, 2, 3}
    • 24
  • Addition of the multiset (i.e. retaining the sum of multiplicities per element) is equivalent to multiplication:
    • 414 * 555
    • {2, 3, 3, 23} + {3, 5, 37}
    • {2, 3, 3, 3, 5, 23, 37}
    • 229770
  • Subtraction of the multiset (i.e. retaining the differences of multiplicities per element between one set and another) is equivalent to division:
    • 88088 / 2002
    • {2, 2, 2, 7, 11, 11, 13} − {2, 7, 11, 13}
    • {2, 2, 11}
    • 44
    • Note that this division is not closed on natural numbers unless the result would also be a natural number (i.e. the left operand must be divisible by the right) because the multiset subtraction itself is not closed on multisets (with nonnegative multiplicities). The multiset notation can still help to reduce the fraction to lowest terms:
      • 2310 / 273
      • {2, 3, 5, 7, 11} − {3, 7, 13}
      • {2, 5, 11} − {13} (since the multiset can’t have negative multiplicities, this is the simplest form)
      • 110 / 13
  • Multiplying all of the multiplicities in the set by n yields the number to the nth power:
    • 1684
    • {2, 2, 2, 3, 7} multiplicities multiplied by 4
    • {2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 7, 7, 7, 7}
    • 796594176
  • If all of the multiplicities in the set are themselves divisible by n, dividing them yields the nth root of the number:
    • ∛474552
    • {2, 2, 2, 3, 3, 3, 13, 13, 13} multiplicities divided by 3
    • {2, 3, 13}
    • 78
  • Meanwhile, the natural-to-multiset transform won’t help you at all with addition and subtraction.

This circuit provides a pulse-sequencing capability using the control voltage inputs of the 555.

The circuit described here triggers three (or an arbitrary number) monostables with differing but related widths. What’s different about this one than others I’ve seen is that this circuit has only one timing signal; in particular, there is only one RC network among all, which could potentially reduce the parts count and, more importantly, increase consistency among the timers.

The timing network is actually sort of a modified RC using a transistor and a voltage reference to charge the capacitor with a constant current. The timing is determined by the base voltage of the PNP transistor, the resistance at the emitter resistance, and the capacitance at the collector. The voltage drop across the resistor is the Vcc – Vb + Veb[1]. As long as none of these three values vary significantly in the application, the resistor’s voltage drop can be considered constant. If the resistor is, as expected, also constant, then the current through it is constant as well: I = V/R. The transistor will adjust as necessary, if possible, to maintain that current through the transistor[2]. If the supply voltage is well regulated, the base voltage can be set with a voltage divider.

The output of this constant current timing network, in contrast to the tricky curve of a regular RC network, is a linear ramp. This makes it easier to calculate the desired control voltage for each monostable. In the example above, the control voltages are 1V, 2V, and 3V, where the 1V and 2V timers produce a pulse exactly 1/3 and 2/3, respectively, the width of the 3V timer pulse. The values need not be this regular. A normal RC network could be substituted if you’re willing to do the math.

The monostable with the longest delay should have its discharge pin connected with its threshold pin in order to discharge the capacitor.

This approach gets a little unfortunate where setting the control voltages comes in. The CV is preset internally with a three-step resistor divider network. The CV itself is roughly 5k from Vcc and roughly 10k from ground. If I’ve done the math correctly, this is equivalent to a 2Vcc/3 supply through a 3.3k resistance. This means that you can’t trivially set the control voltage with a divider—something with a low (ideally zero) output resistance is necessary. Here are some things that might be worth trying:

  • Resistor divider with proper buffer (2 resistors, 1/4 to 1 op amp IC). If you’re not willing to complicate the calculations, an op amp can output the same voltage as the input, with no base-emitter drop. As a bonus, it’s an op amp, which is capable of a whole lot more. If restricted to Radio Shack, I’d pick up two TL082 JFET dual amps (or an LM324 quad if they’re out), then use three of the amps with a resistor ladder to set CVs and the one left over to upgrade the current source. I’d guess this is the option that would involve the least hair-pulling. (Edit: An example of a buffered resistor divider, complete with redone current source, in terms of a 7805 voltage regulator. Would require a supply voltage above 7 volts or so, but it’s an easy out in 2-3 ICs and about 10 resistors, plus the transistor and RC already in the circuit.)
  • Resistor divider with unity gain transistor (2 resistors, 1 NPN transistor). Similarly to the current source, the emitter voltage is a nearly fixed drop below the base voltage.
  • Trimmer (1 potentiometer, potentially up to 2 biasing resistors) if you’re okay with a bit of tuning.
  • Linear voltage regulators/references. Would probably work really well, but might be expensive depending on where you buy[3] and in this case requires a fairly small one (probably around 1V). The LM317 can be treated as a precision 1.25V drop, as long as the minimum load requirement is met.
  • Diode drops (1 diode). In the simulator, I was able to get the circuit to work with a ladder of diode drops for the control voltages (i.e., around 0.7, 1.4, and 2.1V). I’ve read that in practice this doesn’t work nearly as well.

By the way, the intended purpose is to trigger some D flip-flops to read serial data in an experimental line encoding I’ve conceived. It’s a variation of one I discussed way back, but the initial rising edge is followed by three bits of data before the mandatory fall. If it works, it could mean a simple output interface for PCs (e.g. via monitor or sound card) to devices that only understand data in terms of data, clock, and latch. What does this mean? You guessed it—semi-headless 74HC595 displays. (Okay, maybe not so much, but that’d at least be a good way to test it.)

This circuit is available on the Falstad simulator.

  1. [1] Veb may vary slightly depending on current, but is roughly 0.7V under normal circumstances.
  2. [2] Up to near the max collector voltage, in this case slightly lower than the emitter voltage.
  3. [3] LM317 is $3 at Radio Shack, but way cheaper online, especially in a TO-92 package.

Almost as fun as it looks.

Print your own, if you’d like!

This octahedral toy/tool/thing represents manipulations on a logic gate with two inputs A and B and one output Y. A face of the object says what operation it executes, such as “AND”. From there, each side of the face is marked with a variable—A, B, or Y—and rotating past that edge shows you the result of inverting that variable.

For example, if I start at AND, I might turn past Y to NAND (AND with inverted output). Then, I might turn past A to IF (another name for OR NOT), because that’s what /A NAND B is. Finally, I might turn past B to OR, since that’s A OR NOT /B.

Alternatively, you might choose a starting point and an ending point, then trace the path between them to determine how many inverters are necessary to do the conversion.

Note that any face is at most three turns from any other. Polar opposites such as AND and OR are on opposite faces, and changing one to the other involves inverting each input and the output.

The somewhat uncommon logical operator names IMP (short for IMPLIES), NIMP, IF, and NIF appear as one-word substitutes for operations that invert B (NAND NOT, AND NOT, OR NOT, and NOR NOT, respectively). The two-word versions appear as a subtitle.

This isn’t quite as great as it could be. The text sizes and layout need a lot of work; the expression text is only barely big enough to read. However, since the time may never come for improvement, here it is for you to play with.

Enjoy!

Yesterday’s post got me thinking way too hard about the ubiquitous 555 timer. Specifically, I wondered if I might have been missing an optimization that would knock out one of the 555s necessary to make the transparent latch. It turns out that I was right, provided that an active-low enable line is doable. (Details follow.)

Skip this bit if analog creeps you out: The 555 timer has some analog circuitry that seems to be used in the majority of its applications. There is a control voltage (CV) pin that sets thresholds for comparators at the TH (threshold) and /TR (trigger) pins. If TH is above CV, then it registers as high (low otherwise); if /TR is below CV/2, then it registers as low (high otherwise). CV can be set externally[1], but usually it’s left alone, and its default setting is CV = 2Vcc/3 (making CV/2 = Vcc/3).

Really, that’s where the analog part ends. The rest is basically digital, and digital is far more comfortable for those of us from the computer side of the house. The 555 timer is basically an RS latch with mildly analog inputs, plus a preemptive additional reset (/RST). I realized that’s still vague for my taste, so I scribbled out a bunch of notes, drew up some tables, and eventually came up with the following logical equivalent of the 555:

Even if you ignore for a moment the tweakable threshold voltage settings on the threshold and trigger inputs, the 555 has interesting possibilities as a digital circuit.

(Edit: A demonstration of the above on Falstad.)

The chip has three logical inputs. One, /RST, is a legitimate TTL input that is inverted. The others feed into comparators as just described, but ignoring the level details, TH simply feeds into a buffer and /TR pin feeds into an inverter. There are also two outputs, OUT (output) and DIS (discharge) pin. Both carry the same data, but OUT is driven as a logic level while DIS is open-collector[2].

If you ignore the /RST, the 555 is an R-/S latch, where TH and /TR pins serve as R and /S, respectively. If both of those signals are active, the result is by definition undefined.

/RST adds to this an overriding additional reset signal. When active, the output is defined as low even if /TRis also active.

By extracting the behavior of a plain RS latch from this, the relationship among the inputs is clarified:

  • The latch is set if /TR is low and /RSTis high.
  • The latch is reset if THis high, /RSTis low, or both.

Some degenerate cases:

  • As noted before, if /RST is high then TH and /TR form an R-/S latch.
  • If TH is low, then /RST and /TR form a kind of R latch[3], with /RST and /TR as /R and /S, respectively.
  • If TH and /TR are tied together as a single input, and /RST is used as another input, the result is an “AND NOT” logic gate, with TH+/TR as the inverting input and /RST as the non-inverting input.
    • The inverting input is effectively Schmitt-triggered, with the hysteresis zone set between CV/2 and CV. (/RST remains a normal TTL input.)
    • With /RST tied high, this becomes a Schmitt-triggered inverter.

Of particular interest is the R latch. This particular latch is set when /TR is low and /RST is high. On the other hand, this latch is reset when /RST is low with no other condition. When I saw this it seemed a bit familiar, like /RST being high enables the value of /TR having any impact.

So, I prepared a truth table for the D and E lines of a D latch, and entered the corresponding values of /R and /S. The result: /S = /E, and /R = D or /E.

My first conclusion is that, if you’re not bound to using only 555 elements, this could make things a lot easier. It requires an active-low /E, if that can’t be accomplished anywhere else, an NPN inverter (one transistor, two resistors) will suffice. /E is fed directly into /S, and a diode-logic OR gate (2 Schottky diodes, one resistor) figures /R. Even with the inverter that’s way fewer parts than yesterday’s solution.

In four parts, a D latch with active-low enable. Slap an inverter (also pictured) onto the front to make it a normal enable.

(Edit: A demonstration of the above on Falstad.)

Translating the setup to all 555s is trickier. If you can set up for active-low both /E and /D, it’s possible to get it down to three elements: The latch, an inverter into /R, and an AND NOT gate. This follows from the proposition that R = /D and E = /D and not /E. Then, /R = not R. And, of course, for active-high D you add an inverter—4 elements, no better than yesterday.

The difficulty is that the only gate a single 555 can simulate (other than an inverter) is an AND NOT. While it’s possible to make any other gate from AND NOT parts, some take more fudging than others. To make the OR gate we’d like, you need three. My derivation[4] is as follows:

  • A OR B
  • NOT (A NOR B)
  • NOT ((NOT A) AND NOT B)
  • NOT ((1 AND NOT A) AND NOT B)
  • 1 AND NOT ((1 AND NOT A) AND NOT B)

Note that in the last stage there are three “AND NOT”s and no other operators. Note also that, using (Schottky) diode logic, the parts count is the same: Two diodes and one resistor. Guess which one I’d rather breadboard.

  1. [1] Its default value comes from a resistive divider of approximately 5K vs. 10K, so if the value is critical it’s probably better to set it via a lower impedance, such as with a voltage reference or a (unity gain) buffer.
  2. [2] 0V on low, high-impedance on high.
  3. [3] An RS latch that favors R in a tie
  4. [4] Through repeated application of DeMorgan’s laws.

No guarantees, but it worked in the simulator.

Today, figured out how to make a (possibly low-speed) transparent D latch out of an R-/S latch, plus a couple of logic gates, and an inverter. Each of these four things can be constructed from a 555 timer (or half of a 556). Alternatively, each of the gates can be replaced by three diodes[1] and a resistor, and the inverter by an NPN transistor and a resistor. Not pictured is the AC coupling[2] on the E input that would make this an edge-triggered flip-flop instead of a latch.

The significance of this is that a D flip-flop, a 1-bit static RAM and an important element of sequential logic, can be constructed very easily with stuff readily available from Radio Shack, which doesn’t stock logic ICs anymore but does carry 555s and 556s (as well as diodes, resistors, and transistors). Naturally, I have the best interests of the impatient experimenter in mind. :-)

  1. [1] With Schottky diodes you could probably get away with omitting the output diode.
  2. [2] In this case, an RC highpass is used, an inline capacitor of about 0.1µF followed by a 10kΩ pull-down resistor.

Younger hobbyists who want to do arithmetic in an electronics application may think that a microcontroller is the only way to do it. I present something far more old-school.

image

How to do simple math on real numbers in a very non-digital way.

If you can represent the scalar terms of an arithmetic expression as voltages and/or resistances, then many operations can be performed with op amps (which is, incidentally, what the “op” part means). Several of the major features are described above.

  • The word “amp”, for amplifier, refers to the fact that the device amplifies, or enlarges, an input signal by a factor called gain which is represented by the letter A[1] When A is fixed, the amplifier serves to multiply the input signal by A. By default, A is ideally infinite[2], but can be set to a fixed value using resistors.
  • Division (not pictured) can be performed using a resistor divider network. No op amp is required, though using one might be desirable to avoid loading effects.
  • “Subtract then multiply by A > 0″: An op amp’s natural mode (in my opinion, anyway) is the differential amplifier. Two voltages are input; V2 is subtracted from V1, and then the difference is multiplied by A, which is based on the ratios of resistors used. All other linear modes pictured can be described in terms of this one:
    • “Multiply by A >= 1″: If you take the divider off of V2 and replace it with Vin, it’s equivalent to setting V2 to Vin(1 + Rin/RF). Then, just set V1 to 0. The result is A(V2V1) = (RF/Rin)Vin(1 + Rin/RF) = Vin(1 + RF/Rin).
    • “Multiply by A < 0″: Let V2 = 0, V1 = Vin. The result is A(V2V1) = (RF/Rin)(0 – Vin) = Vin(-RF/Rin).
    • “Add then multiply by A < 0″: Exactly the same as the previous case, except that the input is from a resistor divider network. If V1 and V2 are fed to the same point through equal resistances as shown, it’s equivalent to a single input of VX = (V1 + V2)/2 through a resistance of RX = Rin/2. With the other input at 0, the result is A(0 – VX) = (RF/RX)(-VX) = -(2RF/Rin)((V1 + V2)/2) = -(RF/Rin)(V1 + V2).
  • “Logarithmic” and “Exponential”: An NPN transistor with a grounded base can be placed either at the feedback or the input to implement a logarithmic or exponential output, respectively. The coefficients of the expression are not well defined, since they vary from instance to instance, but if the transistors used are closely matched and thermally linked (for example, in a single-chip transistor array) then with some tuning their outputs will be consistent. A log transform can be used to easily multiply (by adding logarithms) or divide (by subtracting logarithms) two voltages.
  • Comparison (not pictured): As discussed here previously, a comparator is a sort of op amp specialized to output only logical high/low depending on whether the value at the non-inverting input is higher than the value of the inverting input, outputting high and low, respectively. In a pinch, an op amp will do the same thing in the open-loop configuration—direct inputs and no feedback, resulting in the extremely high gain mentioned earlier. If the non-inverting input is even slightly different than the inverting input, that small difference is amplified so immensely that the output will be either the highest possible or the lowest possible, meaning around the positive and negative supply voltages, respectively. Use of an actual comparator is preferred though, since it is better tuned for digital use.
  1. [1] The “A” presumably refers to “amplification”.
  2. [2] But in practice is just a very high number.