Context-Free Grammars

In this section we will review some "well-known" results about the relationship between context-free grammars and algebras. These results underly the algebraic approach to the semantics of programming languages. It is "well-known" that every context-free grammar, G,

corresponds to a many sorted signature, (G), with a sort for each non-terminal and an operator for each production in G,
that the free algebra for (G) corresponds to the abstract syntax of G
that the concrete syntax is a (G)-algebra..

We will show how the programs given in the section Signatures, Algebras and Homomorphisms can be used to explore these results.

Outline:

introduction
definition of context-free grammar and context-free language.
examples
the signature of a context-free grammar
The algebras of a context-free grammar

Context-free grammars where originally developed by Noam Chomsky as a means for describing the underlying syntax (grammar) of natural languages. It was first used in computer science to describe the grammar of the programming language ALGOL 60. For a while computer scientists used the term "ALGOL-like languages" rather than Chomsky's term, "context-free language". However it was eventually realized that ALGOL60 was actually not an "ALGOL-like language" and so the term "ALGOL-like-language" ceased to be used. We will later give an example of a programming language, the language LANG1, that does have a context-free grammar. However, most programming languages (and all natural languages) do not have context-free grammars. Despite this fact, context-free grammars are still very useful and reflect the underlying algebraic structure of both kinds of languages. Later we will look further into the question of the relationship between the syntax of programming languages and context-free grammars.

Definition: A context-free grammar G consists of

T, a set (of terminal symbols)
N, a set (of non-terminal symbols) where N T =.
PNx(N T)* (a set of productions)
S N (the start symbol)

We can write a production as a pair p = <A₀, a₀A₁a₁...A_na_n> where 0

n, each A_iN, and each a_iT*. Sometimes a production is written in the form p A₀ a₀A₁a₁...A_na_n

A context-free grammar G = <T, N, P, S> defines a subset L(G) of T* calledthe context free language generated by G. The set L(G) is defined using the relation _G:(NT)* (N T)* where w_Gu iff there exists a production p = <L,R>P and a,b(N T)* such that w=aLb and u = aRb.

Let _G be the transitive closure of _G. Then L(G) is defined to be { u T* | S_G u }.
end

Examples:

Example: Let the grammar G be such that T = {0, 1}, let N = {A}, let P = {<A, 0>, <A, 1A1>}, and let A also be the start symbol. Then L(G) = {0, 101, 11011, ... 1ⁿ01ⁿ...}
end

Example: Logical Expressions:

T = {true, false, and, or, not, (, ) },
N = {E},
P = {<E, true>, <E, false>, <E, (E and E)>, <E, (E or E)>, <E, not(E) > },
and the start symbol is E.

The resulting context-free language, L(G), will be the set of all well-formed logical expressions without variables. Examples of elements of L(G) include

true
(true or false)
(true and (false or not(false)))

end.

Example: Arithmetic Expressions I:

T = { 0,1,2,3,4,5,6,7,8,9,+,*,-,/,(,) }
N = { Digit, Pint, Int, Exp }
P = { <Digit, 0>, <Digit, 1>, <Digit, 2>, <Digit, 3>, <Digit, 4>, <Digit, 5>, <Digit, 6>, <Digit, 7>, <Digit, 8>, <Digit, 9>,

end.

Example: A simple sub-grammar of English.-- yet to come

Example: LANG1 -- click here to see the context-free grammar of a simple programming language.
end.

The Signature of a Context-free Language

Given a context-free grammar G = <T, N, P, S> define Sig(G), the signature of G, to be the signature with

a sort n for each nonterminal nN.
an operator for each production in P

p = <A₀, a₀A₁a₁...A_na_n>

A₁x...x.A_n A₀

Example: Let the grammar G be such that T = {0, 1}, let N = {A}, let P = {<A, 0>, <A, 1A1>}, and let A also be the start symbol. Then the corresponding signature, Gram1, is:

The Signature Gram1 in mathematical form:

name: Gram1

sorts: {A}

operators:
Gram1(A) = {p0}
Gram1(A,A) = {p1}

End of signature Gram1.

Note that, with this signature the only Gram1-terms that we can generate are of the form p1ⁿ(p0( )) (where p1⁰(p0( )) = p0( ) and p1ⁿ⁺¹(p0( )) = p1( p1ⁿ(p0( ))) ).
end.

Example: To see the signature corresponding to the example of a context-free grammar for arithmetic expressions click here.
end.

The Algebras of a context-free Grammar

Information is lost in the process of going from a grammar to its signature -- that is, you can not reconstruct the complete grammar from the signature because the operators lack the strings of terminal symbols found in the productions and the signature does not contain any information about the start symbol. The missing information given by the strings of terminal symbols corresponds to a Sig(G)-algebra, the choice of the start symbol is just the designation of one of its sorts. The grammar-as-a-whole then corresponds to the unique homomorphism h_G:T_Sig(G) Alg(G) from the initial Sig(G)-algebra to the algebra Alg(G). Our claim is that, if s

S is the start symbol for G then the image of (T_Sig(G))_s under the homomorphism h_Gis exactly the set, L(G), i.e., the context free language generated by G.

In terms more familiar to computer scientists, a Sig(G)-term, t, is the abstract syntax of the string h_G(t) which, in turn, is a concrete syntax for t.

If there are two terms t,t' (T_Sig(G))_s such that tt but h_G(t) = h_G(t') then we say that the grammar G is ambiguous.

Since the real interest is in the image of (T_Sig(G))_s under the of h_G , we have some freedom in the definition of Alg(G):

there is no harm in adding "extraneous elements" to the carriers of Alg(G). Indeed there is real benefit if the "extraneous elements" simplify the presentation of the algebra.
it doesn't matter what the operations do when the "extraneous elements" appear as arguments

This being the case we can give the following definition for Alg(G)

For each nN let Alg(G)_n = T*, the set of all strings on the set T of terminal symbols from G.
For each pP, if p = <A₀, a₀A₁a₁...A_na_n>, then let p_Alg(G):(T*)ⁿ T* such that, for all <t₁,...,t_n> (T*)ⁿ , we have p_Alg(G) (t₁,...,t_n ) = a₀t₁a₁...t_na_n the concatenation of the strings in the indicated order.

This definition for p_Alg(G) can be directly translated into a JavaScript expression suitable for use with our JavaScript representation of algebras as

p_Alg(G) (t₁,...,t_n) = a₀+ arg[0] + a₁ + ...+ arg[n] + a_n .

Example: Let the grammar G be such that T = {0, 1}, let N = {A}, let P = {<A, 0>, <A, 1A1>}, and let A also be the start symbol. Then the corresponding algebra, alg_Gram1, is the G-algebra with

carrier {0, 1}*
operations

p0_{alg_Gram1}:

{0, 1}*, p0_{alg_Gram1}

( ) = 0

p1_{alg_Gram1}: {0,1}* {0, 1}*, p1_{alg_Gram1} ( t ) = 1t1

The corresponding algebra in JavaScript is given by alg_Gram1

The algebra alg_Gram1
The signature of alg_Gram1 is: alg_Gram1
The carrier of alg_Gram1 is the set , {1, 0}* . of all strings of zeros and ones. (but is not actually specified in the UA-Calculator version).
The operations of the algebra alg_Gram1 are:
["p0", "'0'"]
["p1", "'1'+arg[1]+'1'"]
End of algebra alg_Gram1

end