上一篇写了大步语义。大步语义像一次函数调用:给它程序和环境,它直接返回最终环境。小步语义换了观察角度:程序运行时会经历一串中间配置,每一步只做一个局部变化。

这个视角很适合 Java 程序员理解调试器、解释器、状态机和工作流引擎。断点、单步执行、重试、恢复、超时保护,都依赖“当前程序状态可以被保存,下一步可以被明确计算”。

本文继续使用上一篇的小语言,补上小步语义(small-step semantics)。核心目标很直接:把一个完整程序改写成一台可推进的机器。

程序状态变成机器配置

小步语义关心的是配置之间的变化。

1
(statement, environment) -> (next_statement, next_environment)

statement 是剩余要执行的程序,environment 是当前变量绑定。每执行一步,语句可能变小,环境也可能变化。

例如:

1
2
(x = 1 + 2, {}) -> (x = 3, {})
(x = 3, {}) -> (do-nothing, {x = 3})

第一步只把表达式 1 + 2 规约成 3。第二步才完成赋值。大步语义会直接得到 {x = 3},小步语义会把中间状态暴露出来。

这套模型里有两个关键词:

  • 可规约(reducible):还能继续向前走一步。
  • 不可规约(irreducible):已经是值,或者已经完成执行。

数字和布尔值是不可规约表达式。do-nothing 是不可规约语句。其他结构基本都能继续向前推进。

模式提炼:配置转移系统

state -> next_state

把程序执行看成配置转移以后,很多工程问题会变得统一。

场景 配置 单步转移
解释器 AST + 环境 规约一个节点
调试器 栈帧 + 变量表 执行下一行
工作流引擎 流程节点 + 上下文 推进到下个节点
订单状态机 订单状态 + 事件 得到新状态
任务调度器 队列 + worker 状态 消费一个任务

小步语义训练的是同一种眼光:先把“正在运行”保存成数据,再写出“下一步”。

AST 仍然沿用上一篇

表达式节点保持不变。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
from dataclasses import dataclass


@dataclass(frozen=True)
class Number:
value: int


@dataclass(frozen=True)
class Boolean:
value: bool


@dataclass(frozen=True)
class Variable:
name: str


@dataclass(frozen=True)
class Add:
left: "Expression"
right: "Expression"


@dataclass(frozen=True)
class Multiply:
left: "Expression"
right: "Expression"


@dataclass(frozen=True)
class LessThan:
left: "Expression"
right: "Expression"


Value = int | bool
Expression = Number | Boolean | Variable | Add | Multiply | LessThan
Environment = dict[str, Value]

语句节点也保持上一篇的五种。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
@dataclass(frozen=True)
class DoNothing:
pass


@dataclass(frozen=True)
class Assign:
name: str
expression: Expression


@dataclass(frozen=True)
class If:
condition: Expression
consequence: "Statement"
alternative: "Statement"


@dataclass(frozen=True)
class Sequence:
first: "Statement"
second: "Statement"


@dataclass(frozen=True)
class While:
condition: Expression
body: "Statement"


Statement = DoNothing | Assign | If | Sequence | While

AST 没有变化,语义规则换了。这个分离很重要:同一份程序结构,可以交给大步语义直接求最终结果,也可以交给小步语义观察中间过程。

表达式先规约成值

表达式的小步规约会遵守一个简单顺序:先规约左边,再规约右边,最后组合。

先写几个辅助函数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def to_literal(value: Value) -> Expression:
if isinstance(value, bool):
return Boolean(value)
return Number(value)


def from_literal(expression: Expression) -> Value:
match expression:
case Number(value):
return value
case Boolean(value):
return value
case _:
raise TypeError(f"not a literal: {expression!r}")


def is_reducible_expression(expression: Expression) -> bool:
return not isinstance(expression, (Number, Boolean))

to_literal 把环境里的 Python 值包回 AST 字面量。from_literal 从 AST 字面量取出 Python 值。这里要先判断 bool,因为 Python 里 boolint 的子类。

表达式规约函数每次只走一步。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def reduce_expression(expression: Expression, environment: Environment) -> Expression:
match expression:
case Number() | Boolean():
return expression

case Variable(name):
return to_literal(environment[name])

case Add(left, right):
if is_reducible_expression(left):
return Add(reduce_expression(left, environment), right)
if is_reducible_expression(right):
return Add(left, reduce_expression(right, environment))
return Number(from_literal(left) + from_literal(right))

case Multiply(left, right):
if is_reducible_expression(left):
return Multiply(reduce_expression(left, environment), right)
if is_reducible_expression(right):
return Multiply(left, reduce_expression(right, environment))
return Number(from_literal(left) * from_literal(right))

case LessThan(left, right):
if is_reducible_expression(left):
return LessThan(reduce_expression(left, environment), right)
if is_reducible_expression(right):
return LessThan(left, reduce_expression(right, environment))
return Boolean(from_literal(left) < from_literal(right))

case _:
raise TypeError(f"unknown expression: {expression!r}")

测试一下:

1
2
3
4
5
6
7
8
expression = Add(Variable("x"), Multiply(Number(2), Number(3)))
environment = {"x": 4}

while is_reducible_expression(expression):
print(expression)
expression = reduce_expression(expression, environment)

print(expression)

输出会看到表达式慢慢变小:

1
2
3
4
Add(left=Variable(name='x'), right=Multiply(left=Number(value=2), right=Number(value=3)))
Add(left=Number(value=4), right=Multiply(left=Number(value=2), right=Number(value=3)))
Add(left=Number(value=4), right=Number(value=6))
Number(value=10)

这里没有递归求出最终结果。每次调用 reduce_expression,只消掉一个局部结构。

语句规约负责推进程序

语句的小步规约返回一对结果:

1
next_statement, next_environment

先定义语句是否可规约。

1
2
def is_reducible_statement(statement: Statement) -> bool:
return not isinstance(statement, DoNothing)

接着写语句规约函数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def reduce_statement(
statement: Statement,
environment: Environment,
) -> tuple[Statement, Environment]:
match statement:
case DoNothing():
return statement, environment

case Assign(name, expression):
if is_reducible_expression(expression):
return Assign(name, reduce_expression(expression, environment)), environment
return DoNothing(), environment | {name: from_literal(expression)}

case If(condition, consequence, alternative):
if is_reducible_expression(condition):
return (
If(reduce_expression(condition, environment), consequence, alternative),
environment,
)
if from_literal(condition):
return consequence, environment
return alternative, environment

case Sequence(first, second):
if isinstance(first, DoNothing):
return second, environment
reduced_first, reduced_environment = reduce_statement(first, environment)
return Sequence(reduced_first, second), reduced_environment

case While(condition, body):
return (
If(
condition,
Sequence(body, statement),
DoNothing(),
),
environment,
)

case _:
raise TypeError(f"unknown statement: {statement!r}")

这段代码是本文的核心。几个规则值得慢慢看。

Assign 会先规约右侧表达式。表达式已经变成字面量后,赋值才会更新环境,并把语句变成 DoNothing()

If 会先规约条件。条件变成布尔字面量后,再选择分支。

Sequence 会先推进第一条语句。第一条语句完成以后,第二条语句接替当前位置。

While 的处理最有意思。它不会直接循环,而是把自己展开成一条 If

1
2
3
4
5
6
7
8
9
while condition do body

==>

if condition then
body;
while condition do body
else
do-nothing

循环由更小的语义结构表达出来。解释器只要懂 IfSequence,就能获得 While 的行为。

模式提炼:复杂结构降解成核心结构

syntactic sugar -> core language

很多语言功能都可以翻译成更小的核心语言。while 可以翻译成 if + sequence + 自身递归。增强 for 可以翻译成迭代器循环。try-with-resources 可以翻译成 try/finally。lambda 可以翻译成对象或函数值。

学习语义时,先找核心结构。核心结构少,规则就少;规则少,语言行为更容易验证。

机器把一步一步串起来

小步语义天然适合封装成一台机器。机器内部保存当前语句和当前环境,step() 推进一步,run() 持续推进直到程序完成。

为了让输出更好读,下面补一组格式化函数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def format_expression(expression: Expression) -> str:
match expression:
case Number(value):
return str(value)
case Boolean(value):
return str(value).lower()
case Variable(name):
return name
case Add(left, right):
return f"({format_expression(left)} + {format_expression(right)})"
case Multiply(left, right):
return f"({format_expression(left)} * {format_expression(right)})"
case LessThan(left, right):
return f"({format_expression(left)} < {format_expression(right)})"
case _:
raise TypeError(f"unknown expression: {expression!r}")


def format_statement(statement: Statement) -> str:
match statement:
case DoNothing():
return "do-nothing"
case Assign(name, expression):
return f"{name} = {format_expression(expression)}"
case If(condition, consequence, alternative):
return (
f"if {format_expression(condition)} "
f"then {{ {format_statement(consequence)} }} "
f"else {{ {format_statement(alternative)} }}"
)
case Sequence(first, second):
return f"{format_statement(first)}; {format_statement(second)}"
case While(condition, body):
return f"while {format_expression(condition)} do {{ {format_statement(body)} }}"
case _:
raise TypeError(f"unknown statement: {statement!r}")

机器本身很短。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
@dataclass
class Machine:
statement: Statement
environment: Environment

def step(self) -> None:
self.statement, self.environment = reduce_statement(
self.statement,
self.environment,
)

def run(self, step_limit: int = 100) -> None:
step = 0

while is_reducible_statement(self.statement):
print(f"{step:02d}: {format_statement(self.statement)} | {self.environment}")
self.step()
step += 1

if step > step_limit:
raise RuntimeError("step limit exceeded")

print(f"{step:02d}: {format_statement(self.statement)} | {self.environment}")

step_limit 是工程保护。小步语义可以清楚描述死循环,但机器一直跑下去会卡住运行环境。给实验程序加一个步数上限,调试体验会好很多。

跑一个 while 程序

用同一个例子观察循环。初始 x = 1,只要 x < 5,就执行 x = x * 3

1
2
3
4
5
6
program = While(
LessThan(Variable("x"), Number(5)),
Assign("x", Multiply(Variable("x"), Number(3))),
)

Machine(program, {"x": 1}).run()

输出会比较长,关键片段如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
00: while (x < 5) do { x = (x * 3) } | {'x': 1}
01: if (x < 5) then { x = (x * 3); while (x < 5) do { x = (x * 3) } } else { do-nothing } | {'x': 1}
02: if (1 < 5) then { x = (x * 3); while (x < 5) do { x = (x * 3) } } else { do-nothing } | {'x': 1}
03: if true then { x = (x * 3); while (x < 5) do { x = (x * 3) } } else { do-nothing } | {'x': 1}
04: x = (x * 3); while (x < 5) do { x = (x * 3) } | {'x': 1}
05: x = (1 * 3); while (x < 5) do { x = (x * 3) } | {'x': 1}
06: x = 3; while (x < 5) do { x = (x * 3) } | {'x': 1}
07: do-nothing; while (x < 5) do { x = (x * 3) } | {'x': 3}
08: while (x < 5) do { x = (x * 3) } | {'x': 3}
...
18: if (9 < 5) then { x = (x * 3); while (x < 5) do { x = (x * 3) } } else { do-nothing } | {'x': 9}
19: if false then { x = (x * 3); while (x < 5) do { x = (x * 3) } } else { do-nothing } | {'x': 9}
20: do-nothing | {'x': 9}

这个输出把循环拆开了:

  • 00 步,while 还是原始形态。
  • 01 步,while 展开成 if
  • 0203 步,条件表达式规约成 true
  • 0407 步,执行一次赋值,把 x 更新成 3
  • 后面重复同一套规则,直到 x 变成 9,条件变成 false

大步语义只会给出最后的 {'x': 9}。小步语义给出的是完整轨迹。

完整代码

下面是完整文件,可以保存为 small_step.py 直接运行。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
from dataclasses import dataclass


@dataclass(frozen=True)
class Number:
value: int


@dataclass(frozen=True)
class Boolean:
value: bool


@dataclass(frozen=True)
class Variable:
name: str


@dataclass(frozen=True)
class Add:
left: "Expression"
right: "Expression"


@dataclass(frozen=True)
class Multiply:
left: "Expression"
right: "Expression"


@dataclass(frozen=True)
class LessThan:
left: "Expression"
right: "Expression"


Value = int | bool
Expression = Number | Boolean | Variable | Add | Multiply | LessThan
Environment = dict[str, Value]


@dataclass(frozen=True)
class DoNothing:
pass


@dataclass(frozen=True)
class Assign:
name: str
expression: Expression


@dataclass(frozen=True)
class If:
condition: Expression
consequence: "Statement"
alternative: "Statement"


@dataclass(frozen=True)
class Sequence:
first: "Statement"
second: "Statement"


@dataclass(frozen=True)
class While:
condition: Expression
body: "Statement"


Statement = DoNothing | Assign | If | Sequence | While


def to_literal(value: Value) -> Expression:
if isinstance(value, bool):
return Boolean(value)
return Number(value)


def from_literal(expression: Expression) -> Value:
match expression:
case Number(value):
return value
case Boolean(value):
return value
case _:
raise TypeError(f"not a literal: {expression!r}")


def is_reducible_expression(expression: Expression) -> bool:
return not isinstance(expression, (Number, Boolean))


def reduce_expression(expression: Expression, environment: Environment) -> Expression:
match expression:
case Number() | Boolean():
return expression

case Variable(name):
return to_literal(environment[name])

case Add(left, right):
if is_reducible_expression(left):
return Add(reduce_expression(left, environment), right)
if is_reducible_expression(right):
return Add(left, reduce_expression(right, environment))
return Number(from_literal(left) + from_literal(right))

case Multiply(left, right):
if is_reducible_expression(left):
return Multiply(reduce_expression(left, environment), right)
if is_reducible_expression(right):
return Multiply(left, reduce_expression(right, environment))
return Number(from_literal(left) * from_literal(right))

case LessThan(left, right):
if is_reducible_expression(left):
return LessThan(reduce_expression(left, environment), right)
if is_reducible_expression(right):
return LessThan(left, reduce_expression(right, environment))
return Boolean(from_literal(left) < from_literal(right))

case _:
raise TypeError(f"unknown expression: {expression!r}")


def is_reducible_statement(statement: Statement) -> bool:
return not isinstance(statement, DoNothing)


def reduce_statement(
statement: Statement,
environment: Environment,
) -> tuple[Statement, Environment]:
match statement:
case DoNothing():
return statement, environment

case Assign(name, expression):
if is_reducible_expression(expression):
return Assign(name, reduce_expression(expression, environment)), environment
return DoNothing(), environment | {name: from_literal(expression)}

case If(condition, consequence, alternative):
if is_reducible_expression(condition):
return (
If(reduce_expression(condition, environment), consequence, alternative),
environment,
)
if from_literal(condition):
return consequence, environment
return alternative, environment

case Sequence(first, second):
if isinstance(first, DoNothing):
return second, environment
reduced_first, reduced_environment = reduce_statement(first, environment)
return Sequence(reduced_first, second), reduced_environment

case While(condition, body):
return (
If(
condition,
Sequence(body, statement),
DoNothing(),
),
environment,
)

case _:
raise TypeError(f"unknown statement: {statement!r}")


def format_expression(expression: Expression) -> str:
match expression:
case Number(value):
return str(value)
case Boolean(value):
return str(value).lower()
case Variable(name):
return name
case Add(left, right):
return f"({format_expression(left)} + {format_expression(right)})"
case Multiply(left, right):
return f"({format_expression(left)} * {format_expression(right)})"
case LessThan(left, right):
return f"({format_expression(left)} < {format_expression(right)})"
case _:
raise TypeError(f"unknown expression: {expression!r}")


def format_statement(statement: Statement) -> str:
match statement:
case DoNothing():
return "do-nothing"
case Assign(name, expression):
return f"{name} = {format_expression(expression)}"
case If(condition, consequence, alternative):
return (
f"if {format_expression(condition)} "
f"then {{ {format_statement(consequence)} }} "
f"else {{ {format_statement(alternative)} }}"
)
case Sequence(first, second):
return f"{format_statement(first)}; {format_statement(second)}"
case While(condition, body):
return f"while {format_expression(condition)} do {{ {format_statement(body)} }}"
case _:
raise TypeError(f"unknown statement: {statement!r}")


@dataclass
class Machine:
statement: Statement
environment: Environment

def step(self) -> None:
self.statement, self.environment = reduce_statement(
self.statement,
self.environment,
)

def run(self, step_limit: int = 100) -> None:
step = 0

while is_reducible_statement(self.statement):
print(f"{step:02d}: {format_statement(self.statement)} | {self.environment}")
self.step()
step += 1

if step > step_limit:
raise RuntimeError("step limit exceeded")

print(f"{step:02d}: {format_statement(self.statement)} | {self.environment}")


if __name__ == "__main__":
program = While(
LessThan(Variable("x"), Number(5)),
Assign("x", Multiply(Variable("x"), Number(3))),
)

Machine(program, {"x": 1}).run()

运行:

1
python small_step.py

最后一行会停在:

1
20: do-nothing | {'x': 9}

大步语义和小步语义的差异

两种语义都在解释同一份程序,但它们回答的问题不同。

问题 大步语义 小步语义
运行完成后环境是什么 擅长 需要跑完整轨迹
中间状态是什么 信息较少 擅长
死循环怎样表现 没有最终结果 状态持续转移
调试器/单步执行 表达不方便 表达自然
解释器实现 递归求值 机器推进
工程风险 递归深度 步数和调度控制

Java 程序员可以把小步语义理解成一种通用执行框架。业务代码里常见的 execute(context) 往往一口气跑完;工作流引擎、规则引擎、调度器更接近 step(context),每一步都能保存、观察、重试。

Java 程序员的迁移表

Python 组件 Java 对照 作用
Machine InterpreterMachine / WorkflowRuntime 保存当前配置
step() advance() / tick() 推进一次
reduce_statement Reducer<Stmt, Env> 单步语义规则
DoNothing Completed / NoOp 终止标记
Sequence 组合命令 控制执行顺序
While -> If 语法糖降解 把复杂结构翻译成核心结构

如果用 Java 写,核心接口大概长这样:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
record Configuration(Stmt statement, Map<String, Value> environment) {}

interface Reducer {
Configuration reduce(Configuration configuration);
}

final class Machine {
private Configuration configuration;

void step() {
configuration = reducer.reduce(configuration);
}

boolean running() {
return !(configuration.statement() instanceof DoNothing);
}
}

这类接口在生产系统里经常出现:流程运行时、状态机、任务调度、规则执行器、游戏主循环,本质上都需要把“当前状态”和“下一步规则”分开管理。

小步语义带来的工程直觉

第一,单步模型更容易调试。每一步都有输入配置和输出配置,可以打印日志,可以做断点,也可以写成测试用例。

第二,单步模型更容易暂停和恢复。只要能序列化当前语句和环境,运行时就能把程序停在某一步,稍后继续。

第三,单步模型更容易做资源控制。运行 N 步后还没有完成,就可以判定超出预算。这个想法会在后面讲不可判定性时再次出现:无法保证所有程序都停机,但可以给实际运行设置边界。

第四,语法糖可以降解。While 展开成 If 后,解释器核心变小。语言设计里经常靠这招减少语义规则数量。

练习

第一,给表达式增加 Subtract,检查 x = 10 - 3 的规约轨迹。

第二,给语句增加 Print。小步语义下,配置可以从 (statement, environment) 扩展成 (statement, environment, output)

第三,给 Machine.run() 增加 trace=False 参数。关闭 trace 时只返回最终环境,打开 trace 时打印每一步。

第四,写一个死循环:

1
While(Boolean(True), DoNothing())

观察它怎样触发 step_limit。这会帮后面的停机问题打底。

参考资料