2026/2/21 14:03:10
网站建设
项目流程
营销型网站建设题库,视频网站建设 方案,富邦建设控股集团网站,杭州绿城乐居建设管理有限公司网站前言某天一位业务研发老哥跑来咨询研发老哥#xff1a;我的服务出现了504#xff0c;但是不太清楚是哪个环节报错#xff0c;每次请求需要访问4个微服务、2个数据库、1个redis、1个消息队列。。。苦逼运维#xff1a;停停停#xff0c;不要再说了#xff0c;目前不支持链…前言某天一位业务研发老哥跑来咨询研发老哥我的服务出现了504但是不太清楚是哪个环节报错每次请求需要访问4个微服务、2个数据库、1个redis、1个消息队列。。。苦逼运维停停停不要再说了目前不支持链路追踪只能手动帮你一个服务一个服务的排查了先请老哥大概描述了一下业务逻辑以及访问方式10分钟过去了。再逐级排查每个服务以及对应访问的资源层终于在半小时之后完成了故障定位。。。这效率也太低了于是关于链路建设项目提上了议程目标只有一个快速定位问题提高稳定性。而链路建设OpenTelemetry是目前行业热点那本运维就来研究研究环境准备组件 版本操作系统 Ubuntu 22.04.4 LTSopentelemetry-sdk 1.35.0安装首先先简单说一下OpenTelemetry的数据采集流程然后先跑起来再去讨论细节OpenTelemetry就是在代码中埋入采集点进行数据采集opentelemetry-sdk再通过固定的协议将数据上传至某个地方进行数据展示jaeger UI安装OpenTelemetry-sdkpip3 install opentelemetry-sdk opentelemetry-exporter-otlp opentelemetry-api安装数据展示jaeger UIdocker pull docker.m.daocloud.io/jaegertracing/all-in-one:latestdocker run -d --name jaeger \-e COLLECTOR_OTLP_ENABLEDtrue \-p 16686:16686 \-p 4317:4317 \-p 4318:4318 \docker.m.daocloud.io/jaegertracing/all-in-one:latestdocker启动之后访问http://127.0.0.1:16686watermarked-first_1第一个例子web服务首先先准备一个web服务这里我们用tornado来实现安装tornadopip3 install tornadoimport tornado.httpserver as httpserverimport tornado.webfrom tornado.ioloop import IOLoopclass TestFlow(tornado.web.RequestHandler):def get(self):self.finish(hello world)def applications():urls []urls.append([r/, TestFlow])return tornado.web.Application(urls)def main():app applications()server httpserver.HTTPServer(app)server.bind(10000, 0.0.0.0)server.start(1)IOLoop.current().start()if __name__ __main__:try:main()except KeyboardInterrupt as e:IOLoop.current().stop()finally:IOLoop.current().close()检查是否能够正常访问watermarked-first_2添加埋点import tornado.httpserver as httpserverimport tornado.webfrom tornado.ioloop import IOLoopfrom opentelemetry import tracefrom opentelemetry.sdk.trace import TracerProviderfrom opentelemetry.sdk.resources import SERVICE_NAME, Resourcefrom opentelemetry.sdk.trace.export import BatchSpanProcessorfrom opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExportertrace.set_tracer_provider(TracerProvider(resourceResource.create({SERVICE_NAME: s1})))tracer trace.get_tracer(__name__)span_processor BatchSpanProcessor(OTLPSpanExporter(endpointhttp://localhost:4318/v1/traces))trace.get_tracer_provider().add_span_processor(span_processor)class TestFlow(tornado.web.RequestHandler):def get(self):views()self.finish(hello world)def views():span tracer.start_span(s1-span)span.end()def applications():urls []urls.append([r/, TestFlow])return tornado.web.Application(urls)def main():app applications()server httpserver.HTTPServer(app)server.bind(10000, 0.0.0.0)server.start(1)IOLoop.current().start()if __name__ __main__:try:main()except KeyboardInterrupt as e:IOLoop.current().stop()finally:IOLoop.current().close()再次访问 curl http://localhost:10000 打开jaeger UI查看watermarked-first_3watermarked-first_4已经有数据了刚才的埋点已经上报至jaeger UI了埋点数据属性丰富一下埋点数据的属性def views():span tracer.start_span(s1-span)span.set_attribute(name, wilson)span.set_attribute(addr, cd)span.end()watermarked-first_5增加数据库访问追踪def views():span tracer.start_span(s1-span)span.set_attribute(name, wilson)span.set_attribute(addr, cd)ctx trace.set_span_in_context(span)get_db(ctx)span.end()def get_db(parent_ctx):span tracer.start_span(s1-span-db, contextparent_ctx)span.end()watermarked-first_6增加跨服务追踪增加第二个web服务s2.pyimport tornado.httpserver as httpserverimport tornado.webfrom tornado.ioloop import IOLoopfrom opentelemetry import tracefrom opentelemetry.sdk.trace import TracerProviderfrom opentelemetry.sdk.resources import SERVICE_NAME, Resourcefrom opentelemetry.sdk.trace.export import BatchSpanProcessorfrom opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporterfrom opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagatortrace.set_tracer_provider(TracerProvider(resourceResource.create({SERVICE_NAME: s2})))tracer trace.get_tracer(__name__)span_processor BatchSpanProcessor(OTLPSpanExporter(endpointhttp://localhost:4318/v1/traces))trace.get_tracer_provider().add_span_processor(span_processor)class TestFlow(tornado.web.RequestHandler):def get(self):ctx TraceContextTextMapPropagator().extract(self.request.headers)span tracer.start_span(s2-span, contextctx)span.end()self.finish(hello world)def applications():urls []urls.append([r/, TestFlow])return tornado.web.Application(urls)def main():app applications()server httpserver.HTTPServer(app)server.bind(20000, 0.0.0.0)server.start(1)IOLoop.current().start()if __name__ __main__:try:main()except KeyboardInterrupt as e:IOLoop.current().stop()finally:IOLoop.current().close()修改s1.pyfrom opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagatorimport requestsdef views():span tracer.start_span(s1-span)span.set_attribute(name, wilson)span.set_attribute(addr, cd)ctx trace.set_span_in_context(span)get_db(ctx)headers {}TraceContextTextMapPropagator().inject(headers, contextctx)requests.get(http://localhost:20000, headersheaders)span.end()watermarked-first_7改造进k8sjaeger编排文件apiVersion: apps/v1kind: Deploymentmetadata:labels:app: jaegername: jaegernamespace: defaultspec:replicas: 1selector:matchLabels:app: jaegertemplate:metadata:labels:app: jaegerspec:containers:- image: docker.m.daocloud.io/jaegertracing/all-in-one:latestimagePullPolicy: Alwaysname: jaegerdnsPolicy: ClusterFirstrestartPolicy: Always---apiVersion: v1kind: Servicemetadata:labels:app: jaeger-servicename: jaeger-servicenamespace: defaultspec:ports:- name: port-4317port: 4317protocol: TCPtargetPort: 4317- name: port-4318port: 4318protocol: TCPtargetPort: 4318- name: port-16686port: 16686protocol: TCPtargetPort: 16686selector:app: jaegertype: NodePorts21制作镜像由于在k8s集群中通过svc访问jaeger需要改造一下s2.pys2.py...import osJAEGER_ADDRos.environ.get(JAEGER_ADDR)...span_processor BatchSpanProcessor(OTLPSpanExporter(endpointJAEGER_ADDR))...DockerfileFROM python:3.8WORKDIR /optRUN pip3 install tornado opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp -i https://pypi.tuna.tsinghua.edu.cn/simpleADD s2.py /optCMD python3 s2.py2编排文件apiVersion: apps/v1kind: Deploymentmetadata:labels:app: s2name: s2namespace: defaultspec:replicas: 1selector:matchLabels:app: s2template:metadata:labels:app: s2spec:containers:- env:- name: JAEGER_ADDRvalue: http://jaeger-service:4318/v1/tracesimage: s2:v1imagePullPolicy: Alwaysname: s2dnsPolicy: ClusterFirstrestartPolicy: Always---apiVersion: v1kind: Servicemetadata:labels:app: s2-servicename: s2-servicenamespace: defaultspec:ports:- name: s2-portport: 20000protocol: TCPtargetPort: 20000selector:app: s2type: NodePorts11制作镜像由于在k8s集群中通过svc访问s2与jaeger需要改造一下s1.pys1.py...import osS2_ADDRos.environ.get(S2_ADDR)JAEGER_ADDRos.environ.get(JAEGER_ADDR)...span_processor BatchSpanProcessor(OTLPSpanExporter(endpointJAEGER_ADDR))...def views():span tracer.start_span(s1-span)span.set_attribute(name, wilson)span.set_attribute(addr, cd)ctx trace.set_span_in_context(span)get_db(ctx)headers {}TraceContextTextMapPropagator().inject(headers, contextctx)requests.get(S2_ADDR, headersheaders)span.end()...Dockerfile:FROM python:3.8WORKDIR /optRUN pip3 install tornado opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp -i https://pypi.tuna.tsinghua.edu.cn/simpleADD s1.py /optCMD python3 s1.py2编排文件apiVersion: apps/v1kind: Deploymentmetadata:labels:app: s1name: s1namespace: defaultspec:replicas: 1selector:matchLabels:app: s1template:metadata:labels:app: s1spec:containers:- env:- name: S2_ADDRvalue: http://s2-service:20000- name: JAEGER_ADDRvalue: http://jaeger-service:4318/v1/tracesimage: s1:v1imagePullPolicy: Alwaysname: s1dnsPolicy: ClusterFirstrestartPolicy: Always---apiVersion: v1kind: Servicemetadata:labels:app: s1-servicename: s1-servicenamespace: defaultspec:ports:- name: s1-portport: 10000protocol: TCPtargetPort: 10000selector:app: s1type: NodePort查看结果▶ kubectl get pod -owideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESjaeger-6669cd7c4-4pl5j 1/1 Running 0 7m31s 10.244.0.236 minikube none nones1-5c569c5b4b-lctzq 1/1 Running 0 73s 10.244.0.237 minikube none nones2-5bb648dcdf-mlnbj 1/1 Running 0 61s 10.244.0.238 minikube none none▶ kubectl get svcNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEjaeger-service NodePort 10.106.13.217 none 4317:31891/TCP,4318:31997/TCP,16686:31002/TCP 5m49ss1-service NodePort 10.102.25.195 none 10000:32376/TCP 4m23ss2-service NodePort 10.103.114.198 none 20000:30032/TCP 3m40s进行数据测试访问s1服务▶ curl http://192.168.49.2:32376hello world%查看jaeger日志访问http://192.168.49.2:31002/watermarked-first_10总结在第一个例子中我们主要采集了业务服务的trace记录即一个完整的请求需要经过的路径包括读取数据库、跨服务请求等等在整个跟踪过程中trace_id与span_id发挥了决定性的作用前者为请求链路的唯一标识串联了整个访问步骤而后者则是链路上每一次不同的具体操作的标识watermarked-first_8采集通过嵌入代码埋点采集重点监控的流程比如数据库读写速度、下游服务速度等处理opentelemetry-sdk对数据进行处理过滤、缓存、合并导出将处理过的数据通过固定的协议otlp协议、grpc协议、http协议等发送到后端系统比如jaeger