Linux-内存不释放,Used与实际使用不符

一台服务器,free -g 查看used 200多G ,可是ps查看进程占用内存,最多也就占用几十G,什么鬼?

检查主机上大页缓存设置:

sysctl -a | grep nr_hugepages

1
2
vm.nr_hugepages = 300
vm.nr_hugepages_mempolicy = 300

发现居然配置了大页内存

于是

1
2
3
4
#vi /etc/sysctl.conf

vm.nr_hugepages = 0
vm.nr_hugepages_mempolicy = 0

恢复正常了

1
2
3
              total        used        free      shared  buff/cache   available
Mem: 509 37 467 0 4 431
Swap: 3 0 3

openstack-服务连接mysql报错

openstack集群nova/neutron等组件,经常报mysql lost connection 错误,非常烦人,具体报错如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
(Background on this error at: http://sqlalche.me/e/e3q8)
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines Traceback (most recent call last):
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py", line 73, in _connect_ping_listener
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines connection.scalar(select([1]))
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines File "/var/lib/kolla/venv/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 920, in scalar
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines return self.execute(object_, *multiparams, **params).scalar()
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines File "/var/lib/kolla/venv/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 988, in execute
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines return meth(self, multiparams, params)
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines File "/var/lib/kolla/venv/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines return connection._execute_clauseelement(self, multiparams, params)
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines File "/var/lib/kolla/venv/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1107, in _execute_clauseelement
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines distilled_params,
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines File "/var/lib/kolla/venv/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines e, statement, parameters, cursor, context
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines File "/var/lib/kolla/venv/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1464, in _handle_dbapi_exception
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines util.raise_from_cause(newraise, exc_info)
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines File "/var/lib/kolla/venv/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines reraise(type(exception), exception, tb=exc_tb, cause=cause)
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines File "/var/lib/kolla/venv/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines cursor, statement, parameters, context
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines File "/var/lib/kolla/venv/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines cursor.execute(statement, parameters)
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines File "/var/lib/kolla/venv/lib/python2.7/site-packages/pymysql/cursors.py", line 170, in execute
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines result = self._query(query)
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines File "/var/lib/kolla/venv/lib/python2.7/site-packages/pymysql/cursors.py", line 328, in _query
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines conn.query(q)
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines File "/var/lib/kolla/venv/lib/python2.7/site-packages/pymysql/connections.py", line 517, in query
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines self._affected_rows = self._read_query_result(unbuffered=unbuffered)
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines File "/var/lib/kolla/venv/lib/python2.7/site-packages/pymysql/connections.py", line 732, in _read_query_result
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines result.read()
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines File "/var/lib/kolla/venv/lib/python2.7/site-packages/pymysql/connections.py", line 1075, in read
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines first_packet = self.connection._read_packet()
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines File "/var/lib/kolla/venv/lib/python2.7/site-packages/pymysql/connections.py", line 657, in _read_packet
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines packet_header = self._read_bytes(4)
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines File "/var/lib/kolla/venv/lib/python2.7/site-packages/pymysql/connections.py", line 707, in _read_bytes
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines CR.CR_SERVER_LOST, "Lost connection to MySQL server during query")
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines [SQL: SELECT 1]
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines (Background on this error at: http://sqlalche.me/e/e3q8)
2021-09-08 18:28:05.866 132 ERROR oslo_db.sqlalchemy.engines
2021-09-08 18:28:06.035 132 ERROR oslo_db.sqlalchemy.engines [req-a50d7dba-7060-4983-932f-9a6fd1105b3a - - - - -] Database connection was found disconnected; reconnecting: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')

这是什么原因呢? 分析如下:

​ 从报错信息能看到是查询时和数据库mysql丢了连接? 为什么?

因为openstack组件连接数据库使用oslo_db,封装的基于sqlalchemy的连接池,所以怀疑是连接池里的连接没有回收,导致使用了过期的连接。

上代码:

#vi /var/lib/kolla/venv/lib/python2.7/site-packages/oslo_db/options.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
cfg.IntOpt(
'connection_recycle_time',
default=3600,
deprecated_opts=[
cfg.DeprecatedOpt('idle_timeout',
group="DATABASE"),
cfg.DeprecatedOpt('idle_timeout',
group="database"),
cfg.DeprecatedOpt('sql_idle_timeout',
group='DEFAULT'),
cfg.DeprecatedOpt('sql_idle_timeout',
group='DATABASE'),
cfg.DeprecatedOpt('idle_timeout',
group='sql')
],
help='Connections which have been present in the connection '
'pool longer than this number of seconds will be replaced '
'with a new one the next time they are checked out from '
'the pool.'),

默认回收时间3600s, 但是我的mysql wait_time 为 1800s,所以只需mysql wait_timeout 大于回收时间就行拉

done~

elastic-Error all shards failed

第一次尝试使用 elastics的go sdk “github.com/olivere/elastic” ,有go版本的真不错,

不料却报错了

1
panic: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]

上代码吧

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
package main

import (
"context"
"elastic-analysis-go/es"
"fmt"
"github.com/olivere/elastic"
)
var (
IndexName = "openstack_error_logs"
)
func main(){
ctx := context.Background()
_ = es.CreateClient("http://10.224.144.15:9011", "USERNAME", "PASSWORD")
//create terms agg order by openstack region
aggs := elastic.NewValueCountAggregation().Field("programname")

searchResult,err := es.EsClient.Search().
Index(IndexName).
Query(elastic.NewMatchAllQuery()). // 设置查询条件
Aggregation("total", aggs). // 设置聚合条件,并为聚合条件设置一个名字
Size(0).
Do(ctx)
if err != nil {
panic(err)
}
// 使用ValueCount函数和前面定义的聚合条件名称,查询结果
agg, found := searchResult.Aggregations.ValueCount("total")
if found {
// 打印结果,注意:这里使用的是取值运算符
fmt.Println(*agg.Value)
}
}


原因分析:

当使用到Field 精准匹配相关的查询时,所以查询的关键字在es上的类型,必须是keyword而不能是text,比如你的搜索条件是 ”programname”:”neutron”,那么该programname 字段的es类型得是keyword,而不能是text
改成使用 elastic.NewValueCountAggregation().Field(“programname.keyword”) 就可以了

spdk-基于spdk vhost qemu启动虚机

qemu启动虚机,driver使用基于spdk的vhost-user-scsi技术, bdev backend为ceph rbd

创建vhost device (上文创建rbd dev名称为Ceph0)

virtio-blk
1
2
#scripts/rpc.py vhost_create_blk_controller --cpumask 0x1 vhost.1 Ceph1
#scripts/rpc.py bdev_virtio_attach_controller --dev-type blk --trtype user --traddr /opt/tmp/vhost.1 --vq-count 2 --vq-size 512 VirtioBlk1

Read More

spdk-基于spdk vhost qemu启动虚机

qemu启动虚机,driver使用基于spdk的vhost-user-scsi技术, bdev backend为ceph rbd

创建vhost device (上文创建rbd dev名称为Ceph0)

virtio-blk
1
2
#scripts/rpc.py vhost_create_blk_controller --cpumask 0x1 vhost.1 Ceph1
#scripts/rpc.py bdev_virtio_attach_controller --dev-type blk --trtype user --traddr /opt/tmp/vhost.1 --vq-count 2 --vq-size 512 VirtioBlk1
virtio-scsi
1
2
3
#scripts/rpc.py vhost_create_scsi_controller --cpumask 0x1 vhost.0
#scripts/rpc.py vhost_scsi_controller_add_target vhost.0 0 Ceph0

启动虚拟机
1
2
3
4
5
6
7
8
9
qemu-system-x86_64 \
--enable-kvm \
-cpu host -smp 2 \
-m 1G -object memory-backend-file,id=mem0,size=1G,mem-path=/dev/hugepages,share=on -numa node,memdev=mem0 \
-drive file=/opt/a0c4f799-aecf-4ebd-af51-03b2f2f14c5a.qcow2,if=none,id=disk \
-device ide-hd,drive=disk,bootindex=0 \
-chardev socket,id=spdk_vhost_scsi0,path=/opt/tmp/vhost.0 \
-device vhost-user-scsi-pci,id=scsi0,chardev=spdk_vhost_scsi0 \
-vnc 10.224.130.52:3

Troubleshooting

1
2
qemu 使用 4.2.0 

spdk-spdk bdev (ceph rbd)块设备操作

因虚拟机后端存储使用rbd, 所以下面spdk bdev 基于ceph rbd

导入ceph集群配置

1
#vim /etc/ceph/conf.conf

创建volume

1
2
#rbd create volumes/volume-4431ef41-381e-490d-9976-adba14d2c05b --size 102400
#scripts/rpc.py bdev_rbd_create volumes volume-4431ef41-381e-490d-9976-adba14d2c05b 512

删除volume

1
#rpc.py bdev_rbd_delete Ceph0

resize volume

1
#rpc.py bdev_rbd_resize Ceph0 102400

spdk-spdk安装指南

基于 CentOS 7 x86_64 安装spdk环境, 需要支持rbd和rdma

环境 : gcc-4.8.5

  1. 下载源码文件(网络不稳定, git中途失败请重复多次运行)
1
2
3
4
5
#git clone https://github.com/spdk/spdk
#cd spdk
#git checkout v21.04
#git submodule update --init

  1. 安装必需环境
1
sudo scripts/pkgdep.sh --all
  1. 构建
1
2
./configure --with-rbd --with-rdma --with-ocf
make

启动vhost target

1
2
#HUGEMEM=4096 scripts/setup.sh
#build/bin/vhost -S /var/tmp -m 0x3

troubleshooting

  1. 已经安装librbd,spdk却不识别librbd
1
2
3
4
5
6
7
./configure --with-rbd --with-rdma
Notice: ISA-L, compression & crypto require NASM version 2.14 or newer. Turning off default ISA-L and crypto features.
Using default SPDK env in /root/spdk/lib/env_dpdk
Using default DPDK in /root/spdk/dpdk/build
Using 'verbs' RDMA provider
--with-rbd requires librados and librbd.
Please install then re-run this script.

安装 librbd1-devel

logstash - 基于docker部署

下载docker镜像

1
#docker pull docker.elastic.co/logstash/logstash:7.6.2

创建本地配置目录

1
#mkdir /usr/local/logstash/config

配置logstash.yml

1
2
3
4
5
6
7
8
9
#vim /usr/local/logstash/config/logstash.yml

config:
reload:
automatic: true
interval: 3s
xpack:
management.enabled: false
monitoring.enabled: false

openstack硬件加速器管理 - Cyborg调研

简介

随着5G、互联网、物联网等技术的发展, 对计算业务进行加速、卸载的需求日渐增多, Cyborg项目由此诞生。

Cyborg(以前叫做Nomad)是OpenStack用于管理硬件和软件加速资源框架, 包括GPU、FPGA、加解密卡, NVMe/NOF SSDs, ODP, DPDK/SPDK等, Cyborg就是OpenStack中的加速即服务(Acceleration as a Service)。该项目在OpenStack Q版本时正式发布, 由华为、联想、红帽主导,是一个很年轻的项目, 目前功能还不是很完善。

Cyborg通过管理、使用计算节点上的加速器硬件, 可以提供电信运营商在NFV以及边缘计算场景下的各种加速服务、提高用户体验、降低CPU负载。运维人员可以通过Cyborg列出、识别和发现加速器,挂载、卸载加速器实例, 安装、卸载驱动。它可以单独使用或与 Nova 或 Ironic 结合使用。

架构

Cyborg采用经典架构,由cyborg-api、cyborg-conductor、cyborg-agent、 cyborg-db几个模块组成。其中Cyborg-agent位于计算节点,用于监控加速器;cyborg-conductor位于控制节点,管理整个系统和操作数据库。cyborg-api和cyborg-db分别为接口和数据库,均位于控制节点。

工作流程

从Cyborg的架构图和API来看, 其主要工作流程如下:

  1. cyborg-api会接收用户创建加速器的API调用, 通过cyborg-conductor保存到cyborg-db

  2. cyborg-conductor通过rpc调用计算节点上的cyborg-agent, cyborg-agent会调用对应的厂商驱动, 最后由厂商驱动来执行配置计算节点底层的加速资源

  3. 计算节点上的cyborg-agent通过定时查询加速资源使用情况, 并通过API发送给placement服务, 该信息最终会给nova在调度时使用。

注: 由于加速设备可能存在多个设备互相关联的层次化结构, 如SRIOV PF和VF之间有关联关系。Nova在O版时, 为resource-providers数据库进行了扩展, 支持了设备的层次化关系。

http://specs.openstack.org/openstack/nova-specs/specs/ocata/approved/nested-resource-providers.html

  1. Nova进行调度, 并在虚拟机中挂载虚拟加速设备给用户使用