Python-gRPC实践(8)--Protobuf插件

November 22, 2022 本文总阅读量次 9865

前言

最近在完善一个Protobuf中的Message转为pydantic.BaseModel对象的库–protobuf_to_pydantic，想为它增加一个从原生Protobuf文件直接生成对应pydantic.BaseModel对象源代码的功能，在通过了解后发现可以通过Protobuf插件的形式来实现

但是搜索了大量的资源后才发现大多数的Protobuf插件都是由Go编写的，并且没有(或者很少)关于Python插件的编写教程以及在Python Protobuf官方文档中找不到任何关于Plugin的介绍，所以踩了很多坑，而本文也就成了我编写Protobuf插件的踩坑总结

如果不知道如何编写Protobuf文件以及如何生成对应的Python代码，可以先阅读Python-gRPC实践(3)–使用Python实现gRPC服务

1.什么是Protobuf插件

在官方的介绍中，Protobuf插件是一个标准的程序，它会从标准输入读取协议缓冲区并写入到CodeGeneratorRequest对象中，然后将CodeGeneratorResponse序列化后通过协议缓冲区写进标准输出，其中这些消息类型是在plugin.proto中定义的。

同时，在使用的过程中可以通过CodeGeneratorRequest获取到Protobuf文件所描述的对象(在Protobuf中称为FileDescriptorProto)，通过这个FileDescriptorProto对象可以得到文件中的所有信息，比如mypy-protobuf就是通过CodeGeneratorRequest对象来生成对应的pyi文件内容，最后再通过CodeGeneratorResponse对象把内容写入到对应的文件中。

如果熟悉Linux的管道，就能知道Protobuf插件的原理与Linux的管道类似，比如下面的例子，首先现在有一个文本文件名为demo.txt,它的内容如下:

1 2	`This is line 1. This is line 2.`

而在调用命令

1	`cat demo.txt\| sed -e 2a\n'wahaha' > new_demo.txt`

后就可以发现新增了一个名为new_demo.txt的文件，且内容如下：

1
2
3

This is line 1.
This is line 2.
wahaha

在这个例子中，demo.txt可以比喻为原来的Protobuf文件，cat命令是加载Protobuf文件的protoc命令，而|就是一个管道，通过|把数据流传到下一个命令中，而sed命令可以认为是一个插件，其中2a\n'wahaha'就是插件要修改的内容，这里的意思就是在第二行后追加一段指定的文本，最后>就是像CodeGeneratorResponse对象一样把管道的数据写入指定的文件中。

Linux管道只允许一个输出流(在不算错误的管道的情况下)，而Protoc命令生成的代码输出不会被插件影响，插件间的输出也不会互相影响。

简单的了解了Protobuf插件后，接下来以grpc-example-common 项目为例，介绍如何制作Protobuf插件。

2.制作一个Protobuf插件

首先是确保已经安装了gRPC和Protobuf的依赖，接着在根目录创建一个名为example_plugin.py的文件，该文件的代码和注释如下：

import logging
import sys
from typing import Set, Iterator, Tuple
from contextlib import contextmanager

from google.protobuf.compiler.plugin_pb2 import CodeGeneratorRequest, CodeGeneratorResponse

# 初始化logger
logger = logging.getLogger(__name__)
logging.basicConfig(
    format="[%(asctime)s %(levelname)s] %(message)s", datefmt="%y-%m-%d %H:%M:%S", level=logging.INFO
)

@contextmanager
def code_generation() -> Iterator[Tuple[CodeGeneratorRequest, CodeGeneratorResponse]]:
    """模仿mypy-protobuf的代码"""
    # 从程序的标准输入读取对应的数据到 CodeGeneratorRequest对象中
    request: CodeGeneratorResponse = CodeGeneratorRequest.FromString(sys.stdin.buffer.read())
    # 初始化 CodeGeneratorResponse 对象 
    response: CodeGeneratorResponse = CodeGeneratorResponse()

    # 声明插件是支持版本为3的protobuf文件也可以使用`OPTIONAL`语法。
    # protoc程序默认是支持的，而插件则是默认不支持的，所以需要开启，避免执行出错。
    response.supported_features |= CodeGeneratorResponse.FEATURE_PROTO3_OPTIONAL

    yield request, response

    # 序列化response对象，并写入到标准输出中 
    sys.stdout.buffer.write(response.SerializeToString())


def main() -> None:
    with code_generation() as (request, response):
        # 获取protoc命令中指定的proto路径，也就是开发者编写proto文件的集合
        file_name_set: Set[str] = {i for i in request.file_to_generate}
        for proto_file in request.proto_file:
            if proto_file.name not in file_name_set:
                # 排除非开发者编写的proto文件，不做多余的解析 
                continue
            # 打印protobuf文件名
            logger.info(proto_file.name)


if __name__ == "__main__":
    main()

通过代码可以发现，这个插件只是一个雏形，它非常简单，只是通过logger打印出插件加载到的Protobuf文件名。

在编写完插件后就可以尝试运行插件了，Protobuf插件是通过protoc命令运行的，在还没使用插件之前，先看看执行生成Python文件的命令长啥样：

python -m grpc_tools.protoc \
  --python_out=./ \
  --grpc_python_out=./ \
  -I protos $(find ./protos -name '*.proto')

protoc命令会加载-I指定的Protobuf文件路径，也就是当前路径下protos目录里面的所有后缀为.proto的文件，而python_out和grpc_python_out是指定生成Python代码的路径，由于定义它们的路径都为.，那么命令会在类似的路径下生成对应的Python代码，比如Protobuf文件所在的目录结构如下：

.  # 也就是项目的根目录grpc-example-common
└── protos
    └── grpc_example_common
        └── protos
            ├── book
            ├── common
            └── user

其中Protobuf文件分别位于book, common, user这三个目录中，那么该命令会在项目的根目录下生成对应的Python代码文件，生成文件后的项目目录如下：

.  # 也就是项目的根目录grpc-example-common
├── grpc_example_common # 这里本来是grpc-example-common，但生成的时候会自动专为grpc_example_common 
│   └── protos
│       └── grpc_example_common
│           └── protos
│               ├── book   # <--- book的Protobuf文件生成的`Python`代码文件下这里
│               ├── common # <--- common的Protobuf文件生成的`Python`代码文件下这里
│               └── user   # <--- user的Protobuf文件生成的`Python`代码文件下这里
└── protos
    └── grpc_example_common
        └── protos
            ├── book
            ├── common
            └── user

现在为了向protoc命令引入我们刚才编写的插件，需要对命令进行修改，如下:

python -m grpc_tools.protoc \
  --plugin=protoc-gen-custom-plugin=./example_plugin.py --custom-plugin_out=. \
  --mypy_grpc_out=./ \
  --mypy_out=./ \
  --python_out=./ \
  --grpc_python_out=./ \
  -I protos $(find ./protos -name '*.proto')

这条命令多了一行内容为--plugin=protoc-gen-custom-plugin=./example_plugin.py --custom-plugin_out=.的文本，其中--plugin指定的值永远要以protoc-gen-开头，后面跟着的custom-plugin则是本次插件的名，=./example_plugin.py则是定义custome-plugin插件的路径。至于后面的--custom-plugin_out=.则是用来定义插件custom-plugin的输出路径为.，也就是插件处理每一个Protobuf文件后输出的文件与protoc命令是同一个目录的。

为了保证插件正确加载，需要确保--plugin=protoc-gen-custom-plugin中的custom-plugin与--custom-plugin_out中的custom-plugin一致。
同时需要注意--plugin=protoc-gen-custom-plugin=./example_plugin.py --custom-plugin_out=. \中最后的文本是. \而不是.\，如果是.\则会导致protoc命令执行出错。

再执行完这个命令后可以在终端看到如下输出:

[22-11-22 20:39:25 INFO] grpc_example_common/protos/book/manager.proto
[22-11-22 20:39:25 INFO] grpc_example_common/protos/book/social.proto
[22-11-22 20:39:25 INFO] grpc_example_common/protos/common/p2p_validate.proto
[22-11-22 20:39:25 INFO] grpc_example_common/protos/common/exce.proto

不过除了生成Python代码外并没有其他文件生成，这是因为现在编写的插件还没有向CodeGeneratorResponse写入任何内容。

为了让插件能够输出内容，现在先编写一个接收文件对象FileDescriptorProto并生成对应Json文件的处理函数process_file，代码如下:

def process_file(
    proto_file: FileDescriptorProto, response: CodeGeneratorResponse
) -> None:
    options = str(proto_file.options).strip().replace("\n", ", ").replace('"', "")
    file = response.file.add()  # 向响应对象添加并返回一个输出的文件对象
    file.name = proto_file.name + ".json"  # 指定输出文件的名字
    # 指定输出文件的内容
    file.content = json.dumps(
        {
            "package": f"{proto_file.package}",  # protobuf 包名
            "filename": f"{proto_file.name}",    # protobuf 文件名
            "dependencies": list(proto_file.dependency),  # protobuf依赖
            "message_type": [MessageToDict(i) for i in proto_file.message_type],  # protobuf 定义的message
            "service": [MessageToDict(i) for i in proto_file.service],  # protobuf定义的service
            "public_dependency": list(proto_file.public_dependency),    # protobuf定义的依赖
            "enum_type": [MessageToDict(i) for i in proto_file.enum_type],  # protobuf定义的枚举值
            "extension": [MessageToDict(i) for i in proto_file.extension],  # protobuf定义的拓展
            "options": dict(item.split(": ") for item in options.split(", ") if options),  # protobuf定义的options
        },
        indent=2
    ) + "\r\n"

接着更改插件中main函数:

def main() -> None:
    with code_generation() as (request, response):
        # 获取protoc命令中指定的proto路径
        file_name_set: Set[str] = {i for i in request.file_to_generate}
        for proto_file in request.proto_file:
            if proto_file.name not in file_name_set:
                # 排除非开发者编写的proto文件，不做多余的解析 
                continue
            process_file(proto_file, response)  # <----修改这里

然后再运行protoc命令即可看到对应的输出结果了，比如对于user.proto,生成的json内容如下：

{
  "package": "user",
  "filename": "grpc_example_common/protos/user/user.proto",
  "dependencies": ["google/protobuf/empty.proto"],
  "message_type": [
    {
      "name": "CreateUserRequest",
      "field": [
        { "name": "uid", "number": 1, "label": "LABEL_OPTIONAL", "type": "TYPE_STRING", "jsonName": "uid" },
        { "name": "user_name", "number": 2, "label": "LABEL_OPTIONAL", "type": "TYPE_STRING", "jsonName": "userName" },
        { "name": "password", "number": 3, "label": "LABEL_OPTIONAL", "type": "TYPE_STRING", "jsonName": "password" }
      ]
    },
  ],
  "service": [
    {
      "name": "User",
      "method": [
        { "name": "get_uid_by_token", "inputType": ".user.GetUidByTokenRequest", "outputType": ".user.GetUidByTokenResult" },
        { "name": "logout_user", "inputType": ".user.LogoutUserRequest", "outputType": ".google.protobuf.Empty" },
        { "name": "login_user", "inputType": ".user.LoginUserRequest", "outputType": ".user.LoginUserResult" },
        { "name": "create_user", "inputType": ".user.CreateUserRequest", "outputType": ".google.protobuf.Empty" },
        { "name": "delete_user", "inputType": ".user.DeleteUserRequest", "outputType": ".google.protobuf.Empty" }
      ]
    }
  ],
  "public_dependency": [],
  "enum_type": [],
  "extension": [],
  "options": {}
}

通过输出的内容可以看出通过插件的方式可以获得到Protobuf文件中的很多输出，而且除了这些数据外，还能提供对应Message的Option数据以及通过proto_file.source_code_info获得到完整的源码信息。

json文件中的message_type内容比较多，所以省略的一些输出，详细的输出可以通过grpc_example_common/protos查看每个Protobuf文件的输出。

本文作者：So1n
本文链接：http://so1n.me/2022/11/22/Python-gRPC%E5%AE%9E%E8%B7%B5(8)--Protobuf%E6%8F%92%E4%BB%B6%20copy/index.html
版权声明：本博客所有文章均采用 BY-NC-SA 许可协议，转载请注明出处！

查看评论