StarLark的使用

StarLark是Python的一种方言(dialect)，主要用作配置语言。StarLark解释器通常是会被嵌入到另一个更大型的应用程序中，除了使用其基本语言核心功能之外，还可能会在应用程序中定义一些额外的领域特定函数和数据类型。比如，StarLark就被嵌入到了Bazel构建工具当中。

本文大部分内容翻译自: StarLark Specification

Reference:

1. 总览

StarLark是一种无类型的动态语言，其具有高级的数据类型、具有词法作用域的一级函数以及自动内存管理或垃圾回收。

StarLark深受Python的影响，其语法是Python的一个子集，语义也几乎是Python的一个子集。作为一个Python程序员来说，会十分熟悉StarLark的数据类型和表达式语法。然而设计StarLark并不是用在写应用程序方面，而是用于配置领域。

StarLark足够简单，其并没有用户定义的类型，也不支持继承、反射、异常(exception)和显式的内存管理。StarLark执行也是有限的，其并不支持递归和死循环。

本文我们主要介绍如下几大部分：

Lexical Elements(词汇元素)
Data types
Name binding and variables
Value concepts
Expressions
Statements
Module execution
Built-in constants and functions
Built-in methods

2. Lexical Elements

StarLark的语法（注：不包括语义)是一个严格的Python子集。在实践中这就意味着，Python相关工具也可用于StarLark。

一个StarLark程序由一个或多个module组成，每一个module都是一个单独的UTF8编码的文本文件。

Punctuation: 如下的标点符号和字符序列都是StarLark的token

+    -    *    /    //   %    **
~    &    |    ^    <<   >>
.    ,    =    ;    :
(    )    [    ]    {    }
<    >    >=   <=   ==   !=
+=   -=   *=   /=   //=  %=
&=   |=   ^=   <<=  >>=

Keywords

and            else           load
break          for            not
continue       if             or
def            in             pass
elif           lambda         return

如下的一些token被保留，未来可能会被作为Keywords使用：

as             global
assert         import
async          is
await          nonlocal
class          raise
del            try
except         while
finally        with
from           yield

Literals(字面量): StarLark支持整数,浮点数，字符串，字节literal

0                               # int
123                             # decimal int
0x7f                            # hexadecimal int
0o755                           # octal int

0.0     0.       .0             # float
1e10    1e+10    1e-10
1.1e10  1.1e+10  1.1e-10

"hello"      'hello'            # string
'''hello'''  """hello"""        # triple-quoted string
r'hello'     r"hello"           # raw string literal

b"hello"     b'hello'           # bytes
b'''hello''' b"""hello"""       # triple-quoted bytes
rb'hello'    br"hello"          # raw bytes literal

3. 数据类型

如下是StarLark解释器内置的主要数据类型：

NoneType                     # the type of None
bool                         # True or False
int                          # a signed integer of arbitrary magnitude
float                        # an IEEE 754 double-precision floating-point number
string                       # a text string, with Unicode encoded as UTF-8 or UTF-16
bytes                        # a byte string
list                         # a fixed-length sequence of values
tuple                        # a fixed-length sequence of values, unmodifiable
dict                         # a mapping from values to values
function                     # a function

有一些函数，比如range，其返回的实例类型并不在上述所列当中。此外，应用程序还可以自定义类型。

有一些操作可以应用于所有类型的值之上。例如，我们可以对任何一个value求type(x)，这样我们可以获取到该value的具体类型；还可以对任何一个值执行str(x)以将其转化成字符串。

下面我们对其中的一些类型挑选几个来介绍：

1） None

None用于指示某个不存在的值。例如，调用一个无返回值的函数时其结果就为None。

None只于其本身相等，其类型为’NoneType’。None的Bool结果为False。

2） Booleans

有两个boolean值：True或False, 其类型为bool

Boolean值通常用在if条件语句中，StarLark的任何其他值被用作条件判断时都会默认的解释为一个Boolean值。例如，None、0、空字符串""、()、[]、{}用作条件判断时都会被解释为False；而非零的数字、非空字符串都会被解释为True。

True或False可以使用int()函数转换为1或0，但是Booleans本身并非数值。

3） Integers

StarLark的整数类型其type为int。

4） String

string是一个不可变的元素序列，用于表示一个Unicode编码的文本。

内置的len()函数可以求一个字符串中元素的个数。我们可以使用+运算符来连接两个字符串。

下标表达式s[i:j]返回字符串s中从[i,j)之间的字符。

string具有如下内置函数：

capitalize
count
elems
endswith
find
format
index
isalnum
isalpha
isdigit
islower
isspace
istitle
isupper
join
lower
lstrip
partition
removeprefix
removesuffix
replace
rfind
rindex
rpartition
rsplit
rstrip
split
splitlines
startswith
strip
title
upper

5） Bytes

bytes是一个不可变的值序列，每一个值的范围是[0,255]。对应的类型为bytes。

string用于表示一个文本字符串，而bytes用于表示一段二进制数据。

我们可以使用+操作符来连接两个bytes序列。

6） Lists

list代表着一个可变的值序列。其type类型为list。

List是可索引的：可以使用for循环来遍历里面的每一个元素。

我们可以使用如下方式来构建List:

[]              # an empty list
[1]             # a 1-element list
[1, 2]          # a 2-element list

也可以使用内置的list函数来从一个可迭代的序列中来创建List。

list comprehension会创建一个新的List元素，例如：

[x*x for x in [1, 2, 3, 4]]      # [1, 4, 9, 16]

list具有如下内置函数：

append
clear
extend
index
insert
pop
remove

7） Tuples

与List相对应，Tuples代表着一个不可变的值序列。其type类型为tuple。

可以使用如下方式来创建tuple:

()                      # the empty tuple
(1,)                    # a 1-tuple
(1, 2)                  # a 2-tuple ("pair")
(1, 2, 3)               # a 3-tuple

ps: 上面第二行中的1-tuple，其末尾的逗号不能省略，否则无法与带括号的表达式(1)区分。一般来说1-tuple很少会用到。

与python不同的是，StarLark不允许在一个不带括号的tuple表达式中出现末尾的逗号，例如：

for k, v, in dict.items(): pass                 # syntax error at 'in'
_ = [(v, k) for k, v, in dict.items()]          # syntax error at 'in'

sorted(3, 1, 4, 1,)                             # ok
[1, 2, 3, ]                                     # ok
{1: 2, 3:4, }                                   # ok

对一个可迭代的序列，我们可以使用内置的tuple函数将其转化为一个tuple。

8） Dictionaries

dictionary是一个可变的kv map。对type类型为dict。

dictionary的内部实现是hash表，因此key必须是可哈希的。可哈希的值包括None、Booleans、numbers、strings、bytes、由可哈希值组成的tuple。而大多数可变的值都不是可哈希的，例如lists、dictionary。

我们可以通过如下方式来定义dictionary:

coins = {
  "penny": 1,
  "nickel": 5,
  "dime": 10,
  "quarter": 25,
}

也可以通过dictionary comprehension来创建dictionary，例如：

words = ["able", "baker", "charlie"]
{x: len(x) for x in words}	# {"charlie": 7, "baker": 5, "able": 4}

我们可以使用如下方式来遍历dict:

for key, value in coins.items():
    print("Key:", key, "Value:", value)

一个dictionary支持如下内置方法：

clear
get
items
keys
pop
popitem
setdefault
update
values

9） Functions

在StarLark中可以使用def语句来定义一个命名函数；可以使用lambda表达式来顶一个匿名函数。例如：

def idiv(x, y):
  return x // y

idiv(6, 3)

4. Name binding and variables

在StarLark文件被解析之后执行之前，StarLark解释器会静态检查程序是否是结构完整的。比如，break和continue是否只出现在for循环语句中； if、for、return是否只出现在函数中；load语句是否只出现在函数外。

Name resolution是解决名称到变量绑定的静态检查程序。

5. Value concepts

StarLark内置一系列核心数据类型，集成StarLark解释器的应用程序也可以自定义数据类型。无论是内核支持的，还是应用程序自定义的都会实现一些基本的操作：

str(x)		-- return a string representation of x
type(x)		-- return a string describing the type of x
bool(x)		-- convert x to a Boolean truth value
hash(x)		-- return a hash code for x

1） Identity and mutation

StarLark中有些数据类型的值是不可变的，比如：NoneType、bool, int, float, string, bytes。对于不可变值其并没有identity概念，因此StarLark程序也就无法确定这些数据类型的两个值(比如整数值）是否代表同一个对象，只能判断两个值是否相等。

其他的数据类型值则是可变的，比如list、dict。它们可以通过类似a[i]=0或items.clear()等语句来进行修改。尽管tuple及function值不能被直接修改，但是它们本身可间接引用mutable值，因此我们认为其也是mutable的。

2） Freezing a value

与其他编程语言相比，StarLark有一个不太一致的地方：mutalbe值可以被frozen，这样后续如果尝试修改其值的话则会在运行的时候报错。

3） Hashing

dict数据类型的内部实现是hash table，因此这要求其key是可哈希的。

对于NoneType, bool, int, float, string, bytes等数据类型的值，是可hash的；对于list、dict数据类型的值，由于其是可变的，因此是不可hash的（ps: 除非该数据类型的值被frozen了）.

4） Sequence types

许多StarLark数据类型代表着一个值序列: list、tuple代表这一系列随机值，而dict在很多上下文当中表现的像是keys的一个序列。

我们可以根据这些sequence类型所支持的操作划分成如下几类：

Iterable: 对于一个iterable值序列，我们可以按固定的顺序处理其每一个元素。比如: list、tuple、dict（ps: string及bytes不是iterable的）
Sequence: 具有内置的长度，这样使得我们可以直接获取到序列中的元素个数。比如: list、tuple、dict(ps: string及bytes不是Sequence)
Indexable: 一个可索引的类型具有固定的长度，并提供高效的方式(ps: 通过索引下标)来访问其每一个元素。比如：string, bytes, tuple, and list
SetIndexable: 可以通过索引下标的方式修改对应位置上的值。比如：list
mapping: 通过关联数组的方式访问对应的值。

ps: It is a dynamic error to mutate a sequence such as a list or a dictionary while iterating over it.

def increment_values(dict):
  for k in dict:
    dict[k] += 1			# error: cannot insert into hash table during iteration

dict = {"one": 1, "two": 2}
increment_values(dict)

5） Indexing

StarLark的很多操作符和函数都需要传递一个索引值，比如a[i]、list.insert(i, x)。

Ivanzz

发上等愿，结中等缘，享下等福；择高处立，寻平处住，向宽处行