Posted on Categories:CS代写, Reinforcement learning, 强化学习, 计算机代写

# CS代写|强化学习代写Reinforcement learning代考|COMP5328 Weak Convergence of Return Functions

avatest™

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

avatest.™ 为您的留学生涯保驾护航 在计算机Computers代写方面已经树立了自己的口碑, 保证靠谱, 高质且原创的计算机Computers代写服务。我们的专家在强化学习Reinforcement learning代写方面经验极为丰富，各种强化学习Reinforcement learning相关的作业也就用不着 说。

## CS代写|强化学习代写Reinforcement learning代考|Weak Convergence of Return Functions

Proposition $4.27$ implies that if each distribution $\eta^\pi(x)$ lies in the finite domain $\mathscr{P}d(\mathbb{R})$ of a given probability metric $d$ that is regular, $c$-homogeneous, and $p$-convex, then $\eta^\pi$ is the unique solution to the equation $$\eta=\mathcal{T}^\pi \eta$$ in the space $\mathscr{P}_d(\mathbb{R})^{\mathcal{X}}$. It does not, however, rule out the existence of solutions outside this space. This concern can be addressed by showing that for any $\eta_0 \in \mathscr{P}(\mathbb{R})^{\mathcal{X}}$, the sequence of probability distributions $\left(\eta_k(x)\right){k \geq 0}$ defined by
$$\eta_{k+1}=\mathcal{T}^\pi \eta_k$$
converges weakly to the return distribution $\eta^\pi(x)$, for each state $x \in \mathcal{X}$. In addition to giving an alternative perspective on the quantitative convergence results of these iterates, the uniqueness of $\eta^\pi$ as a solution to Equation $4.18$ (stated as Proposition 4.9) follows immediately from Proposition $4.34$ below.

## CS代写|强化学习代写Reinforcement learning代考|Random Variable Bellman Operators

In this chapter, we defined the distributional Bellman operator $\mathcal{T}^\pi$ as a mapping on the space of return-distribution functions $\mathscr{P}(\mathbb{R})^{\mathcal{X}}$. We also saw that the action of the operator on a return function $\eta \in \mathscr{P}(\mathbb{R})^{\mathcal{X}}$ can be understood both through direct manipulation of the probability distributions or through manipulation of a collection of random variables instantiating these distributions.

Viewing the operator through its effect on the distribution of a collection of representative random variables is a useful tool for understanding distributional reinforcement learning, and may prompt the reader to ask whether it is possible to avoid referring to probability distributions at all, working instead directly with random variables. We describe one approach to this below using the tools of probability theory, and then discuss some of its shortcomings.

Let $G_0=\left(G_0(x): x \in \mathcal{X}\right)$ be an initial collection of real-valued random variables, indexed by state, supported on a probability space $\left(\Omega_0, \mathscr{F}0, \mathbb{P}_0\right)$. For each $k \in \mathbb{N}^{+}$, let $\left(\Omega_k, \mathscr{F}_k, \mathbb{P}_k\right)$ be another probability space, supporting a collection of random variables $\left(\left(A_k(x), R_k(x, a), X_k^{\prime}(x, a)\right): x \in \mathcal{X}, a \in \mathcal{A}\right)$, with $A_k(x) \sim \pi(\cdot \mid x)$, and independently $R_k(x, a) \sim P{\mathcal{R}}(\cdot \mid x, a), X_k(x, a) \sim P_{\mathcal{X}}(\cdot \mid x, a)$. We then consider the product probability space on $\Omega=\prod_{k \in \mathbb{N}} \Omega_k$. All random variables defined above can naturally be viewed as functions on this joint probability space, that depend on $\omega=\left(\omega_0, \omega_1, \omega_2, \ldots\right) \in \Omega$ only through the coordinate $\omega_k$ that matches the index $k$ on the random variable. Note that under this construction, all random variables with distinct indices are independent.

Now define $\mathscr{X}{\mathbb{N}}$ as the set of real-valued random variables on $(\Omega, \mathscr{F}, \mathbb{P}$ ) (where $\mathscr{F}$ is the product $\sigma$-algebra) that depend on only finitely-many coordinates of $\omega \in \Omega$. We can define a Bellman operator $\mathcal{T}^\pi: \mathscr{X}{\mathbb{N}} \rightarrow \mathscr{X}{\mathbb{N}}$ as follows. Given $G=(G(x): x \in \mathcal{X}) \in \mathscr{X}{\mathbb{N}}^{\mathcal{X}}$, let $K \in \mathbb{N}$ be the smallest integer such that the random variables $(G(x): x \in \mathcal{X})$ depend on $\omega=\left(\omega_0, \omega_1, \omega_2, \ldots\right) \in \Omega$ only through $\omega_0, \ldots, \omega_{K-1} ;$ such an integer exists due to the definition of $\mathscr{X}{\mathbb{N}}$ and the finiteness of $\mathcal{X}$. We then define $\mathcal{T}^\pi G \in \mathscr{X}{\mathbb{N}}$ by
$$\left(\mathcal{T}^\pi G\right)(x)=R_K\left(x, A_K(x)\right)+\gamma G\left(X_K^{\prime}\left(x, A_K(x)\right) .\right.$$

## CS代写|强化学习代写Reinforcement learning代考|Weak Convergence of Return Functions

$$\eta=\mathcal{T}^\pi \eta$$

## CS代写|强化学习代写Reinforcement learning代考|Random Variable Bellman Operators

$\Omega=\prod_{k \in \mathbb{N}} \Omega_k$. 上面定义的所有随机变量目然可以看作是这个联合概率空间上的函数，它依赖于 $\omega=\left(\omega_0, \omega_1, \omega_2, \ldots\right) \in \Omega$ 只 能通过坐标 $\omega_k$ 与索引|配的 $k$ 关于随机变量。请注意，在伩种结构下，所有具有不同索引的随机变量都是独立的。

$$\left(\mathcal{T}^\pi G\right)(x)=R_K\left(x, A_K(x)\right)+\gamma G\left(X_K^{\prime}\left(x, A_K(x)\right) .\right.$$

CS代写|强化学习代写Reinforcement learning代考 请认准UprivateTA™. UprivateTA™为您的留学生涯保驾护航。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。