Python-gRPC实践(7)--gRPC的错误传递

June 23, 2022 本文总阅读量次 17614

前言

之前在文章《Python-gRPC实践(3)–使用Python实现gRPC服务》介绍的实现gRPC服务中使用了一套自定义的协议来传递错误，但这并不是一个优雅的解决方案，因为这种方案的兼容性很差，好在官方定义了一种解决方案，通过这种方案可以使不同的服务都能传递错误。

1.自定义的错误传递

在编写普通的HTTP/1.1接口时，我们都会定制一套业务相关的错误来与HTTP标注的错误区分开，比如通常都会返回这样一个结构体：

{
    "code": "0",
    "msg": "success",
    "data": {}
}

这个结构体包含了code,msg和data三个字段，他们分别是错误码，错误信息，和要返回的结构。
客户端在收到响应后，会判断code的值是什么，如果属于定义的成功状态码则通过data提取数据，否则把msg信息通过异常抛出来。

在使用gRPC中更不例外，因为我们在使用gRPC调用时，就像调用一个普通函数一样，不过gRPC服务间是通过传递message数据来进行交互的，每个调用的请求message和响应message都已经被固定了，如果我们想返回一个错误信息，那么必定会跟响应结构体不一样，所以错误的信息的结构体一定要跟响应体匹配，否则只能另寻它路，比如在每个响应体嵌入错误信息的字段，如下:

message Demo {
    string a=1;
    int32 b=2;

    int32 err_code=3;
    string err_msg=4;
}

然后服务端判断调用执行出错就把错误转换为对应的err_code和err_msg再塞入到message中传给客户端，而客户端每收到调用响应就判断err_code是否有值，有则代表是异常请求，只把err_code和err_msg提取出来生成一个异常并抛给调用者，否则就正常返回数据。

采用这种方法可以兼容每一种调用，但是并不是十分的优雅，如果能通过别的协议容器把数据传给客户端，客户端通过对应的协议解析到错误信息并生产异常就好，在之前介绍的gRPC服务中，就是采用gRPC.metadata来传输数据。同时为了能自动处理服务端异常的捕获和客户端的异常生成，会分别在客户端和服务端设置一个顶层的拦截器，服务端的顶层拦截器代码如下(因为其它的拦截器可能会抛错，所以捕获错误的拦截器一定要放置在最顶层)：

# code url: https://github.com/so1n/grpc-example-common/blob/v0.1.5/grpc_example_common/interceptor/server_interceptor/customer_top.py
import logging
import time
from typing import Any, Callable, List, Tuple

import grpc

from grpc_example_common.helper.context import context_proxy

from .base import BaseInterceptor


class CustomerTopInterceptor(BaseInterceptor):
    def intercept(
        self,
        next_handler_method: Callable,
        request_proto_message: Any,
        context: grpc.ServicerContext,
    ) -> Any:
        return_initial_metadata: List[Tuple] = [("customer-user-agent", "Python3")]
        try:
            # 执行gRPC的调用
            return next_handler_method(request_proto_message, context)
        except Exception as e:
            # 限定客户端带有如下Key-Value的才设置错误信息
            if self.metadata_dict.get("customer-user-agent", "") == "Python3":
                return_initial_metadata.append(("exc_name", e.__class__.__name__))
                return_initial_metadata.append(("exc_info", str(e)))
            # 抛出异常，这样gRPC服务端就能捕获到对应的异常，方便服务端进行后续的处理
            raise e
        finally:
            # 发送结束metadata流到客户端
            context.send_initial_metadata(return_initial_metadata)

该拦截器会捕获调用的异常，然后把异常的方法名和异常信息存在metedata中，这里之所以把值设置到metadata中，而不通过context.set_code,context.set_details来设置错误码和错误信息是有原因的。

首先是code，gRPC限制了只能设置它允许的code，所以这会限制我们去自定义code，同时我们也不应该把业务的错误码设置到响应的错误码中，所以不在这里使用context.set_code；而对于set_details,则是因为gRPC服务端在捕获到异常后会解析对应的异常，然后把异常数据通过context.set_details设置到details中，如下：

def _call_behavior(rpc_event,
                   state,
                   behavior,
                   argument,
                   request_deserializer,
                   send_response_callback=None):
    from grpc import _create_servicer_context
    with _create_servicer_context(rpc_event, state,
                                  request_deserializer) as context:
        try:
            response_or_iterator = None
            # 调用请求
            if send_response_callback is not None:
                response_or_iterator = behavior(argument, context,
                                                send_response_callback)
            else:
                response_or_iterator = behavior(argument, context)
            return response_or_iterator, True
        except Exception as exception:  # pylint: disable=broad-except
            with state.condition:
                if state.aborted:
                    _abort(state, rpc_event.call, cygrpc.StatusCode.unknown,
                           b'RPC Aborted')
                elif exception not in state.rpc_errors:
                    # 这里判断并不属于grpc的错误，则会把错误信息设置到details
                    details = 'Exception calling application: {}'.format(
                        exception)
                    _LOGGER.exception(details)
                    _abort(state, rpc_event.call, cygrpc.StatusCode.unknown,
                           _common.encode(details))
            return None, False

这就意味着我们即使在拦截器设置了details，但是由于抛出来的异常并不属于gRPC的异常，所以details最终被异常信息覆盖了。

了解完了服务端的拦截器实现，接下来看看客户端的拦截器实现，代码如下：

# code url: https://github.com/so1n/grpc-example-common/blob/v0.1.5/grpc_example_common/interceptor/client_interceptor/customer_top.py
import inspect
import logging
from typing import Any, Callable, Dict, List, Optional, Type


from .base import GRPC_RESPONSE, BaseInterceptor, ClientCallDetailsType


class CustomerTopInterceptor(BaseInterceptor):
    def __init__(self, exc_list: Optional[List[Type[Exception]]] = None):
        self.exc_dict: Dict[str, Type[Exception]] = {}
        for key, exc in globals()["__builtins__"].items():
            # 注册Python自带的异常
            if inspect.isclass(exc) and issubclass(exc, Exception):
                self.exc_dict[key] = exc

        if exc_list:
            # 注册用户指定的异常
            for exc in exc_list:
                if issubclass(exc, Exception):
                    self.exc_dict[exc.__name__] = exc

    def intercept(
        self,
        method: Callable,
        request_or_iterator: Any,
        call_details: ClientCallDetailsType,
    ) -> GRPC_RESPONSE:
        if call_details.metadata is not None:
            # 添加协定的信息 
            call_details.metadata.append(("customer-user-agent", "Python3"))  # type: ignore
        response: GRPC_RESPONSE = method(call_details, request_or_iterator)
        metadata_dict: dict = {item.key: item.value for item in response.initial_metadata()}
        if metadata_dict.get("customer-user-agent") == "Python3":
            # 提取异常信息
            exc_name: str = metadata_dict.get("exc_name", "")
            exc_info: str = metadata_dict.get("exc_info", "")
            # 通过exc_name查找异常
            exc: Optional[Type[Exception]] = self.exc_dict.get(exc_name)
            if exc:
                # 抛出异常
                raise exc(exc_info)
        return response

可以看出客户端拦截器通过获取服务端返回的metada来判断是否有异常信息，如果有就提取出并抛出错误，否则就正常返回响应。这样一来只要客户端服务端都设置了正确的拦截器，客户端就能获得到服务端的错误信息并抛出异常，不过这种实现方式是依赖gRPC.metadata传输数据的，而gRPC.metadata的值必须是ASCII或者规范的字节，不然就不给传输甚至还会卡住请求，这就意味着我们需要对错误信息进行一些序列化。

2.基于官方协定的错误传输实现

由于上面的实现不是很优雅，于是就上网冲浪寻找一个官方的实现，后面终于在Github中找到了官方的错误传输示例，其中官方的服务端示例代码如下：

def create_greet_limit_exceed_error_status(name):
    # 创建一个Message对象
    detail = any_pb2.Any()
    # 把一个自定义的错误转为一个Any的对象，这样收发消息时就不会出现校验不通过的情况了
    detail.Pack(
        error_details_pb2.QuotaFailure(violations=[
            error_details_pb2.QuotaFailure.Violation(
                subject="name: %s" % name,
                description="Limit one greeting per person",
            )
        ],))
    # 生成一个Status对象，这个对象包括了code,message,details三个字段
    return status_pb2.Status(
        code=code_pb2.RESOURCE_EXHAUSTED,
        message='Request limit exceeded.',
        # 错误对象数组
        details=[detail],
    )


class LimitedGreeter(helloworld_pb2_grpc.GreeterServicer):

    def __init__(self):
        self._lock = threading.RLock()
        self._greeted = set()

    def SayHello(self, request, context):
        # 对应的gRPC调用
        with self._lock:
            if request.name in self._greeted:
                rich_status = create_greet_limit_exceed_error_status(
                    request.name)
                context.abort_with_status(rpc_status.to_status(rich_status))
            else:
                self._greeted.add(request.name)
        return helloworld_pb2.HelloReply(message='Hello, %s!' % request.name)

该示例代码中的SayHello方法逻辑非常简单，它判断如果name不存在，就把name添加到集合中，并正常返回，如果已经存在，则先生成一个Status对象，再通过to_status方法生成一个_Status对象，最后通过abort_with_stauts方法把_Status对象传进去，这样就把错误数据传输到了客户端。

其中abort_with_stauts方法会使请求引发异常并以非正常状态终止，再把用户指定的Status对象传给客户端，而to_status的源码如下：

def to_status(status):
    return _Status(code=code_to_grpc_status_code(status.code),
                   details=status.message,
                   trailing_metadata=((GRPC_DETAILS_METADATA_KEY,
                                       status.SerializeToString()),))

通过源码可以看出这个函数就是把status.code转为gRPC响应的code，把status.message转为gRPC的details，最后把status转为合法的字符串，并通过GRPC_DETAILS_METADATA_KEY把字符串设置到metadata中。

而对于客户端则比较简单，源码如下：

def process(stub):
    try:
        response = stub.SayHello(helloworld_pb2.HelloRequest(name='Alice'))
        _LOGGER.info('Call success: %s', response.message)
    except grpc.RpcError as rpc_error:
        _LOGGER.error('Call failure: %s', rpc_error)
        # 通过`grpc.RpcError`提取Status对象
        status = rpc_status.from_call(rpc_error)
        for detail in status.details:
            # 读取detail里面的对象，并判断是不是对应的message，如果是则打印一条错误日志，如果不是则抛错
            if detail.Is(error_details_pb2.QuotaFailure.DESCRIPTOR):
                info = error_details_pb2.QuotaFailure()
                detail.Unpack(info)
                _LOGGER.error('Quota failure: %s', info)
            else:
                raise RuntimeError('Unexpected failure: %s' % detail)

这段代码中，如果是正常响应，则打印响应体，而如果是异常，客户端会发现响应体的code并不是正常的状态码，所以会抛出一个grpc.RpcError异常，然后通过rpc_status.from_call函数提取异常, 这个函数的逻辑非常简单，源码如下：

def from_call(call):
    # 如果没有metadata数据就直接返回空
    if call.trailing_metadata() is None:
        return None
    # 有数据就遍历数据
    for key, value in call.trailing_metadata():
        # 如果Key为官方指定的Key，就进入提取数据逻辑
        if key == GRPC_DETAILS_METADATA_KEY:
            # 把数据反序列化成一个message对象
            rich_status = status_pb2.Status.FromString(value)
            # 校验对象数据是否跟响应体一样
            if call.code().value[0] != rich_status.code:
                raise ValueError(
                    'Code in Status proto (%s) doesn\'t match status code (%s)'
                    % (code_to_grpc_status_code(rich_status.code), call.code()))
            if call.details() != rich_status.message:
                raise ValueError(
                    'Message in Status proto (%s) doesn\'t match status details (%s)'
                    % (rich_status.message, call.details()))
            return rich_status
    return None

通过源码看出这个逻辑和自定义的错误传递一样，也是通过metadata提取数据然后拼成一个异常对象。不过，需要注意的是from_call的call参数不仅支持grpc.RpcError，它还支持客户端拦截器中得到的response对象，因为call参数在form_call中用到了trailing_metadata，code和details方法都是grpc.RpcError和response对象共同拥有的方法。

在简单的了解了gRPC的错误传递示例后可以发现，官方的方法与自定义的错误传递很类似，只不过它定义了一个规范的Key，这样一来大家都会认为这个Key对应的值是一个Status对象的序列化成的字符串（由于序列化了，就不用担心存在非ASCII字符的问题）。而这个Status对象中包含了code,message和detail三个字段，分别对应着上面所说的错误结构体:

{
    "code": "0",
    "msg": "success",
    "data": {}
}

中的code,msg和data，不过需要注意的是detail是一个数组，它可以存放多个自定义的Message对象。

3.重新设计错误传递实现

通过官方的错误传输实现可以发现，这个例子需要服务端的业务逻辑主动通过context.abort_with_status逻辑来主动把错误信息设置到metadata中，同时也需要客户端捕获grpc.RpcError异常再打印出来，这样对业务层来说是非常啰嗦的，于是就尝试把官方协定的错误传输实现与自定义的错误传递结合起来。

首先是定义一个内部统一的message：

message Exec{
  string name = 1; // 异常名
  string msg = 2;  // 异常信息
}

这个Message只用于内部业务服务，如果该服务端有开发给其它部门使用，且他们没有兼容这个message，他们也可以通过code和detail知道大概是什么样的错误。

然后就开始折腾服务端的顶层拦截器，这个拦截器只要改造捕获异常部分的代码即可，源码如下:

# code url: https://github.com/so1n/grpc-example-common/blob/v0.1.7/grpc_example_common/interceptor/server_interceptor/customer_top.py
class CustomerTopInterceptor(BaseInterceptor):
    def intercept(
        self,
        next_handler_method: Callable,
        request_proto_message: Any,
        context: grpc.ServicerContext,
    ) -> Any:
        try:
            # 服务调用
            return next_handler_method(request_proto_message, context)
        except Exception as e:
            # 创建一个Message对象
            detail = any_pb2.Any()
            # 把一个自定义的错误转为一个Any的对象，这样收发消息时就不会出现校验不通过的情况了
            # 需要注意的是，这里是我们自己定义的message.Exec
            detail.Pack(
                Exec(
                    name=e.__class__.__name__,
                    msg=str(e)
                )
            )
            # 通过abort_with_status把数据通过metadata传给客户端
            context.abort_with_status(
                rpc_status.to_status(
                    status_pb2.Status(
                        code=code_pb2.RESOURCE_EXHAUSTED,  # 这里只允许填写gRPC的错误码，就像我们定义了业务的错误码为2001，但是HTTP的状态码还是200一样
                        message=str(e),
                        details=[detail], # 这里是一个数组，所以这里可以定义多套异常的对象去兼容不同的系统，不过在内部调用中尽量统一只有一套方法
                    )
                )
            )
            # 抛出异常，不过gRPC服务端判断该调用已经被标记为abort，不会继续处理
            # 但是对于其它的功能却是有用的，比如opentelemetry的官方实现是在channel外再套用一个channel，所以它需要捕获异常并生成对应的Event
            raise e

接着就折腾客户端的顶层拦截器，同样的它只需要改一下数据的获取就可以了，源码如下：

# code url: https://github.com/so1n/grpc-example-common/blob/v0.1.7/grpc_example_common/interceptor/client_interceptor/customer_top.py
class CustomerTopInterceptor(BaseInterceptor):

    # 注册异常的带按摩略
    ...

    def intercept(
        self,
        method: Callable,
        request_or_iterator: Any,
        call_details: ClientCallDetailsType,
    ) -> GRPC_RESPONSE:
        response: GRPC_RESPONSE = method(call_details, request_or_iterator)
        # 前面说到`from_call`也支持客户端拦截器里通过`method`方法得到的response对象
        status: Optional[status_pb2.Status] = rpc_status.from_call(response)
        # 如果不为None,则证明得到了异常数据
        if status:
            for detail in status.details:
                # 判断这个detail是不是我们要的Message
                if detail.Is(Exec.DESCRIPTOR):
                    # 通过反序列化获取数据
                    exec_instance: Exec = Exec()
                    detail.Unpack(exec_instance)
                    # 生成异常并抛出
                    exec_class: Type[Exception] = self.exc_dict.get(exec_instance.name) or RuntimeError
                    raise exec_class(exec_instance.msg)
                else:
                    raise RuntimeError('Unexpected failure: %s' % detail)
        return response

这样一来，新的错误传递实现已经完成了，现在通过一个简单的demo来验证成果，demo代码如下:

# grpc_example_common url:https://github.com/so1n/grpc-example-common/tree/v0.1.7
# 服务端代码
from concurrent import futures
from typing import List

import grpc
from grpc_example_common.interceptor.server_interceptor.base import BaseInterceptor
from google.protobuf.empty_pb2 import Empty  # type: ignore
from grpc_example_common.protos.user import user_pb2 as user_message
from grpc_example_common.interceptor.server_interceptor.customer_top import CustomerTopInterceptor

from grpc_example_common.protos.user import user_pb2_grpc as user_service


class UserService(user_service.UserServicer):

    def delete_user(self, request: user_message.DeleteUserRequest,
                    context: grpc.ServicerContext) -> Empty:
        uid: str = request.uid
        if uid == "123":
            return Empty()
        else:
            raise ValueError(f"Not found user:{uid}")


def main(host: str = "127.0.0.1", port: str = "9000") -> None:
    interceptor_list: List[BaseInterceptor] = [CustomerTopInterceptor()]
    server: grpc.server = grpc.server(
        futures.ThreadPoolExecutor(max_workers=10),
        interceptors=interceptor_list,
    )
    user_service.add_UserServicer_to_server(UserService(), server)
    server.add_insecure_port(f"{host}:{port}")
    server.start()
    try:
        server.wait_for_termination()
    except KeyboardInterrupt:
        server.stop(0)


if __name__ == "__main__":
    main()

# 客户端代码
import grpc
from grpc_example_common.protos.user import user_pb2 as user_message
from grpc_example_common.protos.user import user_pb2_grpc as user_service
from grpc_example_common.interceptor.client_interceptor.customer_top import CustomerTopInterceptor

channel: grpc.Channel = grpc.intercept_channel(
    grpc.insecure_channel("127.0.0.1:9000"), CustomerTopInterceptor()
)
user_stub: user_service.UserStub = user_service.UserStub(channel)
user_stub.delete_user(user_message.DeleteUserRequest(uid="123"))
user_stub.delete_user(user_message.DeleteUserRequest(uid="456"))

编写完demo后开始运行，运行后客户端抛出如下错误信息：

Traceback (most recent call last):
  File "/home/so1n/github/grpc-example-project/grpc-example-api-backend-service/demo.py", line 11, in <module>
    user_stub.delete_user(user_message.DeleteUserRequest(uid="456"))
  File "/home/so1n/github/grpc-example-project/grpc-example-api-backend-service/.venv/lib/python3.8/site-packages/grpc/_interceptor.py", line 216, in __call__
    response, ignored_call = self._with_call(request,
  File "/home/so1n/github/grpc-example-project/grpc-example-api-backend-service/.venv/lib/python3.8/site-packages/grpc/_interceptor.py", line 254, in _with_call
    call = self._interceptor.intercept_unary_unary(continuation,
  File "/home/so1n/github/grpc-example-project/grpc-example-api-backend-service/.venv/lib/python3.8/site-packages/grpc_example_common/interceptor/client_interceptor/base.py", line 74, in intercept_unary_unary
    return self.intercept(continuation, request, call_details)
  File "/home/so1n/github/grpc-example-project/grpc-example-api-backend-service/.venv/lib/python3.8/site-packages/grpc_example_common/interceptor/client_interceptor/customer_top.py", line 44, in intercept
    raise exec_class(exec_instance.msg)
ValueError: Not found user:456

通过信息可以发现，重新设计的错误传递实现完美运行。

本文作者：So1n
本文链接：http://so1n.me/2022/06/23/Python-gRPC%E5%AE%9E%E8%B7%B5(7)--gRPC%E9%94%99%E8%AF%AF%E4%BC%A0%E9%80%92%20copy/index.html
版权声明：本博客所有文章均采用 BY-NC-SA 许可协议，转载请注明出处！

查看评论