Go back to previous page
Forum URL: http://www.eyrie-productions.com/Forum/dcboard.cgi
Forum Name: General
Topic ID: 1361
Message ID: 16
#16, RE: Text Transformation Language
Posted by zwol on Jun-26-15 at 12:21 PM
In response to message #13
>I know you say that you loathe BASIC, but I'd like to ask you to take
>a look at one sub and one sub only... "Populate_Command." I really
>hate the horribly inelegant way "Populate_Command" recognizes TTL
>syntax. It's a bunch of "If" statements encapsulated in a massive
>"Select" statement. Any advice there, even just general advice
>unrelated to BASIC, would be greatly appreciated.

I'm not the person this was addressed to, but I am a computer language nerd ;-)

Parsing a "formal language", which is what you're trying to do, is a really well-understood problem in theoretical computer science, and one that comes up all the damn time in industry, but almost no general-purpose programming languages are any good at it. (One of the design goals of Perl 6 was to become good at this. Perl 6 has been vaporware for ~15 years now.) If you look at, say, the handwritten code in GCC that parses C, you'll see that it has the same kind of awkward structure as your parser - lots of functions full of big hairy if-else chains conditioned on what word comes next.

However, there are special-purpose programming languages that are good at this task. The thing you want is called a parser generator. This is a tool that will take a short, readable specification that looks something like your documentation for TTL, and generate a blob of completely unreadable code that does the job of parsing it, which you then combine with the rest of your program. Unfortunately I don't know if there are any parser generators that work with this dialect of BASIC. This one works with VB.NET, though, and it looks like its documentation is written for people who don't already know from formal languages. If that one doesn't suit you, Wikipedia has a long list of more.

(GCC doesn't use a parser generator because it would get in the way of going as fast as possible and generating diagnostics that are as clear as possible. This is a major concern for a production C compiler but not so much for you.)

>Will do! I only learned about "sed" the other day on Reddit! I suspect
>if I'd know about "sed" back in '11, TTL might have never come to be.
>I'll check out "AWK" as well!

sed and awk are certainly good things to crib from, but please do adopt the PCRE regular expression syntax if you're going to have regular expressions at all. Yes, it is cryptic and hard to learn, but basically every popular programming language nowadays supports something close to it, which means everyone only has to learn that cryptic syntax once.

(In particular please don't copy the regular expression syntax from sed -- sed's regex syntax is closely related to PCRE's, but simultaneously harder to read and less powerful.)

>I know, I know, BASIC is more abstracted from the binary executable,
>which means the compiler can do some really inefficient things on your
>behalf. But starting with a Commodore64, BASIC has been my programming
>language literally since I was an 8-year-old desperately trying to
>re-type the BASICA code that "3-2-1 Contact" magazine published
>without creating a typo.

You shouldn't feel bad about preferring BASIC. BASIC got a lot of flak back in the C64 days because it was impossible to write anything in it that wasn't a pile of spaghetti. But modern BASIC has all the same structured-programming conveniences that will be found in other languages, and is reasonably efficient as well.

If you want to learn another programming language, I would recommend not picking C. You will learn more about how to program well by learning Haskell or Scheme, and you will gain more practical ability to make computers do what you want by learning Python, PHP, and/or JavaScript. C nowadays is actually bad at teaching you how the machine works at the low level, for reasons too tedious to get into here; if you want to learn that, have a look at Rust.