openstack-nova-虚拟机热迁移过程源码刨析

热迁移主要包括三个阶段:

  • pre_live_migration 阶段:热迁移前的准备阶段,主要在目的计算节点上提前准备虚拟机资源,包括网络资源,例如:建立虚拟机的网卡,然后将网卡加入 OvS br-int 网桥。如果该阶段失败,会有回滚操作。
  • 内存迁移阶段:该阶段完全虚拟机虚拟内存数据的迁移,如果虚拟机的系统盘在计算节点本地,那么系统盘数据也会在此时进行迁移。如果该阶段失败,会有回滚操作,回滚流程和 pre_live_migration 阶段一致。
  • post_live_migration 阶段:热迁移完成后资源清理阶段,源计算节点主要是断开源计算节点上虚拟机的卷连接、清理源计算节点虚拟机的网卡资源;目的节点主要是调用 neutronclient,更新 Port Host 属性为目的计算节点。(NOTE:该阶段无需回滚流程,因为虚拟机实际上已经成功迁移,再回滚没有意义)

热迁移流程图如下:

1
#nova live-migration XXXXXXXXXXXXXXXXXXX

当执行虚机热迁移时,请求发给nova-api, nova中的入口函数是nova/api/openstack/compute 下的migrate_server.py文件中的_migrate_live函数:

1
2
3
4
5
6
7
8
9
10
11
wsgi.action('os-migrateLive')
@validation.schema(migrate_server.migrate_live, "2.0", "2.24")
@validation.schema(migrate_server.migrate_live_v2_25, "2.25", "2.29")
@validation.schema(migrate_server.migrate_live_v2_30, "2.30", "2.67")
@validation.schema(migrate_server.migrate_live_v2_68, "2.68")
def _migrate_live(self, req, id, body):
try:
self.compute_api.live_migrate(context, instance, block_migration,
disk_over_commit, host, force,
async_)
****************

调用到nova/compute/api.py中

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
@check_instance_lock
@check_instance_cell
@check_instance_state(vm_state=[vm_states.ACTIVE, vm_states.PAUSED])
def live_migrate(self, context, instance, block_migration,
disk_over_commit, host_name, force=None, async_=False):
# 修改虚拟机状态,记录数据库动作
# 生成request_spec
# 删除console auth的token信息
# 检查目标宿主机信息
try:
self.compute_task_api.live_migrate_instance(context, instance,
host_name, block_migration=block_migration,
disk_over_commit=disk_over_commit,
request_spec=request_spec, async_=async_)
****************

live_migrate_instance又调用到了nova/conductor/api.py中的live_migrate_instance函数:

1
2
3
4
5
6
7
8
9
def live_migrate_instance(self, context, instance, host_name,
block_migration, disk_over_commit,
request_spec=None, async_=False):
scheduler_hint = {'host': host_name}
if async_:
self.conductor_compute_rpcapi.live_migrate_instance(
context, instance, scheduler_hint, block_migration,
disk_over_commit, request_spec)
*******************8

live_migrate_instance又调用到了同级下的nova/conductor/rpcapi.py中的live_migrate_instance函数:

1
2
3
4
5
6
7
8
9
kw = {'instance': instance, 'scheduler_hint': scheduler_hint,
'block_migration': block_migration,
'disk_over_commit': disk_over_commit,
'request_spec': request_spec,
}
version = '1.15'
cctxt = self.client.prepare(version=version)
# rpc 广播, 从而交到conductor中处理
cctxt.cast(context, 'live_migrate_instance', **kw)

nova/conductor/manager.py 中的live_migrate_instance 收到请求后

1
2
3
4
5
@wrap_instance_event(prefix='conductor')
def live_migrate_instance(self, context, instance, scheduler_hint,
block_migration, disk_over_commit, request_spec):
self._live_migrate(context, instance, scheduler_hint,
block_migration, disk_over_commit, request_spec)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
## 初始化一个热迁移任务跟踪对象
migration = objects.Migration(context=context.elevated())
migration.dest_compute = destination
migration.status = 'accepted'
migration.instance_uuid = instance.uuid
migration.source_compute = instance.host
migration.migration_type = 'live-migration'
if instance.obj_attr_is_set('flavor'):
migration.old_instance_type_id = instance.flavor.id
migration.new_instance_type_id = instance.flavor.id
else:
migration.old_instance_type_id = instance.instance_type_id
migration.new_instance_type_id = instance.instance_type_id
migration.create()

# 创建热迁移任务,并执行
task = self._build_live_migrate_task(context, instance, destination,
block_migration, disk_over_commit,
migration, request_spec)
try:
task.execute()
***************
1
2
3
4
5
6
7
8
9
10
11
def _build_live_migrate_task(self, context, instance, destination,
block_migration, disk_over_commit, migration,
request_spec=None):
return live_migrate.LiveMigrationTask(context, instance,
destination, block_migration,
disk_over_commit, migration,
self.compute_rpcapi,
self.servicegroup_api,
self.query_client,
self.report_client,
request_spec)

代码到了 nova/conductor/tasks/live_migrate.py 下的 _execute

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
  def _execute(self):
# 检查云主机是否active状态
self._check_instance_is_active()
# 检查源主机宿主机是否有numa topology
self._check_instance_has_no_numa()
# 检查源主机宿主机是否可用状态
self._check_host_is_up(self.source)
#检查源节点,目标节点信息
source_node, dest_node = self._check_requested_destination()

return self.compute_rpcapi.live_migration(self.context,
host=self.source,
instance=self.instance,
dest=self.destination,
block_migration=self.block_migration,
migration=self.migration,
migrate_data=self.migrate_data)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def _check_requested_destination(self):
"""Performs basic pre-live migration checks for the forced host.

:returns: tuple of (source ComputeNode, destination ComputeNode)
"""
# 确保源主机和宿主机不是同一物理主机
self._check_destination_is_not_source()
# 检查目的主机是否可用
self._check_host_is_up(self.destination)
# 检查目的主机是否有足够的内存
self._check_destination_has_enough_memory()
# 检查两个源主机和宿主机之间是否兼容
source_node, dest_node = self._check_compatible_with_source_hypervisor(
self.destination)
# 检查下是否可以对目的主机执行热迁移操作
self._call_livem_checks_on_host(self.destination)
# Make sure the forced destination host is in the same cell that the
# instance currently lives in.
# NOTE(mriedem): This can go away if/when the forced destination host
# case calls select_destinations.
source_cell_mapping = self._get_source_cell_mapping()
dest_cell_mapping = self._get_destination_cell_mapping()
if source_cell_mapping.uuid != dest_cell_mapping.uuid:
raise exception.MigrationPreCheckError(
reason=(_('Unable to force live migrate instance %s '
'across cells.') % self.instance.uuid))
return source_node, dest_node

其中_call_livem_checks_on_host函数会rpc 远程调用到目的主机上去执行/nova/compute/manager.py 下的 check_can_live_migrate_destination函数来检验目的主机是否满足热迁移,同时目的主机也会远程调用check_can_live_migrate_source函数检查源主机是否支持热迁移。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def check_can_live_migrate_destination(self, ctxt, instance,
block_migration, disk_over_commit,
migration=None, limits=None):
"""Check if it is possible to execute live migration.

This runs checks on the destination host, and then calls
back to the source host to check the results.

:param context: security context
:param instance: dict of instance data
:param block_migration: if true, prepare for block migration
if None, calculate it in driver
:param disk_over_commit: if true, allow disk over commit
if None, ignore disk usage checking
:param migration: objects.Migration object for this live migration.
:param limits: objects.SchedulerLimits object for this live migration.
:returns: a LiveMigrateData object (hypervisor-dependent)
"""
src_compute_info = obj_base.obj_to_primitive(
self._get_compute_info(ctxt, instance.host))
dst_compute_info = obj_base.obj_to_primitive(
self._get_compute_info(ctxt, CONF.host))
dest_check_data = self.driver.check_can_live_migrate_destination(ctxt,
instance, src_compute_info, dst_compute_info,
block_migration, disk_over_commit)
dest_check_data = self._dest_can_numa_live_migrate(dest_check_data,
migration)
LOG.debug('destination check data is %s', dest_check_data)
try:
migrate_data = self.compute_rpcapi.check_can_live_migrate_source(
ctxt, instance, dest_check_data)
LOG.debug('migrate_data:%s' % migrate_data)
if ('src_supports_numa_live_migration' in migrate_data and
migrate_data.src_supports_numa_live_migration):
migrate_data = self._live_migration_claim(
ctxt, instance, migrate_data, migration, limits)
LOG.debug('migrate_data:%s' % migrate_data)
elif 'dst_supports_numa_live_migration' in dest_check_data:
LOG.info('Destination was ready for NUMA live migration, '
'but source is either too old, or is set to an '
'older upgrade level.', instance=instance)
'''
# Create migrate_data vifs
migrate_data.vifs = \
migrate_data_obj.VIFMigrateData.create_skeleton_migrate_vifs(
instance.get_network_info())
# Claim PCI devices for VIFs on destination (if needed)
port_id_to_pci = self._claim_pci_for_instance_vifs(ctxt, instance)
# Update migrate VIFs with the newly claimed PCI devices
self._update_migrate_vifs_profile_with_pci(migrate_data.vifs,
port_id_to_pci)
'''
finally:
self.driver.cleanup_live_migration_destination_check(ctxt,
dest_check_data)
return migrate_data

接着便是调用到了nova/compute/rpcapi.py中的live_migration函数,该函数远程调用了nova-compute服务的live_migration方法,交给nova-compute服务来进行处理:

1
2
3
4
5
6
7
8
9
def live_migration(self, ctxt, instance, dest, block_migration, host,
migration, migrate_data=None):
version = '5.0'
client = self.router.client(ctxt)
cctxt = client.prepare(server=host, version=version)
cctxt.cast(ctxt, 'live_migration', instance=instance,
dest=dest, block_migration=block_migration,
migrate_data=migrate_data, migration=migration)

1
2
3
4
5
6
7
8
def live_migration(self, context, dest, instance, block_migration,
migration, migrate_data):
try:
future = self._live_migration_executor.submit(
self._do_live_migration, context, dest, instance,
block_migration, migration, migrate_data)
self._waiting_live_migrations[instance.uuid] = (migration, future)

1
2
3
4
5
6
7
8
9
def _do_live_migration(self, context, dest, instance, block_migration,
migration, migrate_data):
with self.virtapi.wait_for_instance_event(
instance, events, deadline=deadline,
error_callback=error_cb):
with timeutils.StopWatch() as timer:
migrate_data = self.compute_rpcapi.pre_live_migration(
context, instance,
block_migration, disk, dest, migrate_data)

_do_live_migration函数中的核心代码是pre_live_migration和live_migration的调用。先看self.compute_rpcapi.pre_live_migration函数,其是远程调用到目的主机上执行pre_live_migration函数:

1
2
3
4
5
6
7
8
9
10
11
def pre_live_migration(self, ctxt, instance, block_migration, disk,
host, migrate_data):
version = '5.0'
client = self.router.client(ctxt)
cctxt = client.prepare(server=host, version=version,
timeout=CONF.long_rpc_timeout,
call_monitor_timeout=CONF.rpc_response_timeout)
return cctxt.call(ctxt, 'pre_live_migration',
instance=instance,
block_migration=block_migration,
disk=disk, migrate_data=migrate_data)

又回到 compute/manager.py 下的 pre_live_migration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
def pre_live_migration(self, context, instance, block_migration, disk,
migrate_data):
bdms = objects.BlockDeviceMappingList.get_by_instance_uuid(
context, instance.uuid)

network_info = self.network_api.get_instance_nw_info(context, instance)
self._notify_about_instance_usage(
context, instance, "live_migration.pre.start",
network_info=network_info)
compute_utils.notify_about_instance_action(
context, instance, self.host,
action=fields.NotificationAction.LIVE_MIGRATION_PRE,
phase=fields.NotificationPhase.START, bdms=bdms)

connector = self.driver.get_volume_connector(instance)
try:
for bdm in bdms:
if bdm.is_volume and bdm.attachment_id is not None:
attach_ref = self.volume_api.attachment_create(
context, bdm.volume_id, bdm.instance_uuid,
connector=connector, mountpoint=bdm.device_name)
migrate_data.old_vol_attachment_ids[bdm.volume_id] = \
bdm.attachment_id

# update the bdm with the new attachment_id.
bdm.attachment_id = attach_ref['id']
bdm.save()

block_device_info = self._get_instance_block_device_info(
context, instance, refresh_conn_info=True,
bdms=bdms)
# 调用libvirt/driver.py 做连接上磁盘和挂载上网络的工作等
migrate_data = self.driver.pre_live_migration(context,
instance,
block_device_info,
network_info,
disk,
migrate_data)
LOG.debug('driver pre_live_migration data is %s', migrate_data)
migrate_data.wait_for_vif_plugged = (
CONF.compute.live_migration_wait_for_vif_plug)

# 初始化好网络
self.network_api.setup_networks_on_host(context, instance,
self.host)

# 在热迁移进行前在目的主机上创建好那些网络过滤规则
self.driver.ensure_filtering_rules_for_instance(instance,
network_info)
****************

目标宿主机上调用了 self.driver.pre_live_migration,跳到 libvirt/driver.py下的 pre_live_migration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def pre_live_migration(self, context, instance, block_device_info,
network_info, disk_info, migrate_data):
"""Preparation live migration."""
***************************************
#创建libvirt以及nova相关本地目录,以及根据是否云盘做镜像相关fetch操作
#连接cinder volume server
# Establishing connection to volume server.
block_device_mapping = driver.block_device_info_get_mapping(
block_device_info)

if len(block_device_mapping):
LOG.debug('Connecting volumes before live migration.',
instance=instance)
#plug vif初始化网卡信息
self._pre_live_migration_plug_vifs(
instance, network_info, migrate_data)

在目的主机上执行完pre_live_migration函数后,源主机上调用live_migration开始执行热迁移操作,z再回看 compute/manager.py 下的 _do_live_migration,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def _do_live_migration(self, context, dest, instance, block_migration,
migration, migrate_data):
try:
self.driver.live_migration(context, instance, dest,
post_live_migration,
rollback_live_migration,
block_migration, migrate_data)
except Exception:
LOG.exception('Live migration failed.', instance=instance)
with excutils.save_and_reraise_exception():
# Put instance and migration into error state,
# as its almost certainly too late to rollback
self._set_migration_status(migration, 'error')
# first refresh instance as it may have got updated by
# post_live_migration_at_destination
instance.refresh()
self._set_instance_obj_error_state(context, instance,
clean_task_state=True)

调用到nova/virt/libvirt/driver.py中的live_migration函数,再调用到_live_migration函数:

1
2
3
4
5
6
7
def live_migration(self, context, instance, dest,
post_method, recover_method, block_migration=False,
migrate_data=None):
#检查目标主机名是否正确,执行迁移
self._live_migration(context, instance, dest,
post_method, recover_method, block_migration,
migrate_data)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def _live_migration(self, context, instance, dest, post_method,
recover_method, block_migration,
migrate_data):
opthread = utils.spawn(self._live_migration_operation,
context, instance, dest,
block_migration,
migrate_data, guest,
device_names)
try:
LOG.debug("Starting monitoring of live migration",
instance=instance)
self._live_migration_monitor(context, instance, guest, dest,
post_method, recover_method,
block_migration, migrate_data,
finish_event, disk_paths)

这里主要有两个核心调用,一个是_live_migration_operation进行迁移操作,一个是调用_live_migration_monitor函数用以监控迁移进度。

1
2
3
4
5
6
7
8
9
10
def _live_migration_operation(self, context, instance, dest,
block_migration, migrate_data, guest,
device_names):
guest.migrate(self._live_migration_uri(dest),
migrate_uri=migrate_uri,
flags=migration_flags,
migrate_disks=device_names,
destination_xml=new_xml_str,
bandwidth=CONF.libvirt.live_migration_bandwidth)
LOG.debug("Migrate API has completed", instance=instance)

调用了 guest.py 下的 migrate迁移

1
2
3
4
def migrate(self, destination, migrate_uri=None, migrate_disks=None,
destination_xml=None, flags=0, bandwidth=0):
self._domain.migrateToURI3(
destination, params=params, flags=flags)

_live_migration_monitor的主要实现则是调用了libvirt的job_info函数获取进度情况。

热迁移过程总结:

(1)nova-api收到热迁移请求,验证权限、配额等并获取虚拟机信息,通过消息队列向nova-conductor发起热迁移请求

(2)nova-conductor检查虚拟机是否是开机状态并检查源宿主机计算服务是否正常,然后通过消息队列请求nova-scheduler服务选择目的宿主机

(3)获取到目的宿主机后,对目的宿主机进行多项是否符合热迁移条件的检查,比如两宿主机的CPU是否兼容、目的宿主机是否可以做热迁移(这里会有个互相检查对方是否可以进行热迁移)等

(4)nova-conductor通过消息队列服务让目的宿主机执行热迁移工作

(5)目的宿主机进行网络初始化、网络过滤规则和磁盘准备工作等并通过消息队列让源宿主机执行热迁移操作

(6)源宿主机调用libvirt的热迁移API进行热迁移操作