TensorBoard Histogram Dashboard（TensorBoard直方图仪表板）

TensorBoard Histogram Dashboard

TensorBoard柱状图仪表板显示了TensorTensorFlow图中某些分布随时间的变化情况。它通过在不同时间点显示张量的许多直方图可视化来实现这一点。

一个基本的例子

让我们从一个简单的例子开始：一个正态分布的变量，其中平均值随时间变化。TensorFlow有一个tf.random_normal非常适合这个目的的操作。与TensorBoard通常情况一样，我们将使用摘要操作来提取数据；在这种情况下，'tf.summary.histogram'。有关总结如何工作的入门知识，请参阅常规TensorBoard教程。

这是一个代码片段，它将生成一些包含正态分布数据的直方图摘要，其中分布的均值随时间而增加。

import tensorflow as tf

k = tf.placeholder(tf.float32)

# Make a normal distribution, with a shifting mean
mean_moving_normal = tf.random_normal(shape=[1000], mean=(5*k), stddev=1)
# Record that distribution into a histogram summary
tf.summary.histogram("normal/moving_mean", mean_moving_normal)

# Setup a session and summary writer
sess = tf.Session()
writer = tf.summary.FileWriter("/tmp/histogram_example")

summaries = tf.summary.merge_all()

# Setup a loop and write the summaries to disk
N = 400
for step in range(N):
  k_val = step/float(N)
  summ = sess.run(summaries, feed_dict={k: k_val})
  writer.add_summary(summ, global_step=step)

一旦代码运行，我们可以通过命令行将数据加载到TensorBoard中：

tensorboard --logdir=/tmp/histogram_example

一旦TensorBoard正在运行，请将其加载到Chrome或Firefox中并导航到直方图仪表板。然后我们可以看到我们的正态分布数据的可视化直方图。

tf.summary.histogram需要一个任意大小和形状的张量，并将其压缩成一个由宽度和数量组成的直方图数据结构。例如，假设我们要将这些数字组织[0.5, 1.1, 1.3, 2.2, 2.9, 2.99]到垃圾箱中。我们可以创建三个bin： 一个bin，包含0到1之间的所有元素（它将包含一个元素，0.5）， 一个包含1-2的所有元素（它将包含两个元素1.1和1.3）的bin，* bin 2-3（它将包含三个元素：2.2,2.9和2.99）。

TensorFlow使用类似的方法创建分档，但与我们的示例不同，它不创建整数分档。对于大型稀疏数据集，可能会导致数千个分档。相反，箱体呈指数分布，许多箱体接近0，非常大数量的箱体相对较少。然而，可视化指数分布箱是棘手的; 如果使用高度来编码计数，则较宽的箱子需要更多的空间，即使它们具有相同数量的元素。相反，该区域的编码计数使高度比较不可能。取而代之的是，直方图将数据重新采样为统一的分箱。在某些情况下，这可能会导致不幸的文物。

直方图可视化器中的每个切片显示单个直方图。切片按步骤组织; 较旧的切片（例如步骤0）进一步“返回”且较暗，而较新的切片（例如步骤400）接近前景且颜色较浅。右边的y轴显示步骤编号。

您可以将鼠标悬停在直方图上以查看带有更详细信息的工具提示。例如，在下面的图像中，我们可以看到时间步176处的直方图具有以2.25为中心的仓，该仓中有177个元素。

另外，您可能会注意到，直方图切片并不总是按步数或时间均匀分布。这是因为 TensorBoard 使用数据仓抽样来保留所有直方图的一个子集，以节省内存。数据仓抽样保证每个样本都有相同的被包含的可能性，但是因为它是一个随机算法，选择的样本不会在偶数步发生。

重叠模式

仪表板左侧有一个控件，可以将直方图模式从“偏移”切换到“覆盖”：

在“偏移”模式下，可视化旋转45度，以便各个直方图切片不再及时展开，而是全部绘制在相同的y轴上。

现在，每个切片都是图表上的单独一行，y轴显示每个存储区内的项目数。较深的线条较旧，较早的步骤和较浅的线条是较新的较晚的步骤。再次，您可以将鼠标悬停在图表上以查看其他信息。

一般来说，如果要直接比较不同直方图的计数，覆盖可视化将非常有用。

多模式分配

直方图仪表板非常适合可视化多模式分布。我们通过连接两个不同正态分布的输出来构造一个简单的双峰分布。代码如下所示：

import tensorflow as tf

k = tf.placeholder(tf.float32)

# Make a normal distribution, with a shifting mean
mean_moving_normal = tf.random_normal(shape=[1000], mean=(5*k), stddev=1)
# Record that distribution into a histogram summary
tf.summary.histogram("normal/moving_mean", mean_moving_normal)

# Make a normal distribution with shrinking variance
variance_shrinking_normal = tf.random_normal(shape=[1000], mean=0, stddev=1-(k))
# Record that distribution too
tf.summary.histogram("normal/shrinking_variance", variance_shrinking_normal)

# Let's combine both of those distributions into one dataset
normal_combined = tf.concat([mean_moving_normal, variance_shrinking_normal], 0)
# We add another histogram summary to record the combined distribution
tf.summary.histogram("normal/bimodal", normal_combined)

summaries = tf.summary.merge_all()

# Setup a session and summary writer
sess = tf.Session()
writer = tf.summary.FileWriter("/tmp/histogram_example")

# Setup a loop and write the summaries to disk
N = 400
for step in range(N):
  k_val = step/float(N)
  summ = sess.run(summaries, feed_dict={k: k_val})
  writer.add_summary(summ, global_step=step)

你已经记得上面例子中的“移动均值”正态分布。现在我们也有一个“缩小差异”的分布。并排，它们看起来像这样：

当我们连接它们时，我们会得到一张清晰显示不同双峰结构的图表：

更多的分布

为了好玩，让我们生成并可视化更多的分布，然后将它们合并成一个图表。以下是我们将使用的代码：

import tensorflow as tf

k = tf.placeholder(tf.float32)

# Make a normal distribution, with a shifting mean
mean_moving_normal = tf.random_normal(shape=[1000], mean=(5*k), stddev=1)
# Record that distribution into a histogram summary
tf.summary.histogram("normal/moving_mean", mean_moving_normal)

# Make a normal distribution with shrinking variance
variance_shrinking_normal = tf.random_normal(shape=[1000], mean=0, stddev=1-(k))
# Record that distribution too
tf.summary.histogram("normal/shrinking_variance", variance_shrinking_normal)

# Let's combine both of those distributions into one dataset
normal_combined = tf.concat([mean_moving_normal, variance_shrinking_normal], 0)
# We add another histogram summary to record the combined distribution
tf.summary.histogram("normal/bimodal", normal_combined)

# Add a gamma distribution
gamma = tf.random_gamma(shape=[1000], alpha=k)
tf.summary.histogram("gamma", gamma)

# And a poisson distribution
poisson = tf.random_poisson(shape=[1000], lam=k)
tf.summary.histogram("poisson", poisson)

# And a uniform distribution
uniform = tf.random_uniform(shape=[1000], maxval=k*10)
tf.summary.histogram("uniform", uniform)

# Finally, combine everything together!
all_distributions = [mean_moving_normal, variance_shrinking_normal,
                     gamma, poisson, uniform]
all_combined = tf.concat(all_distributions, 0)
tf.summary.histogram("all_combined", all_combined)

summaries = tf.summary.merge_all()

# Setup a session and summary writer
sess = tf.Session()
writer = tf.summary.FileWriter("/tmp/histogram_example")

# Setup a loop and write the summaries to disk
N = 400
for step in range(N):
  k_val = step/float(N)
  summ = sess.run(summaries, feed_dict={k: k_val})
  writer.add_summary(summ, global_step=step)

伽玛分布

均匀分布

泊松分布

泊松分布是在整数上定义的。所以，所有生成的值都是完美的整数。直方图压缩将数据移动到浮点数据库中，导致可视化文件在整数值上显示很小的颠簸，而不是完美的尖峰。

总而言之

最后，我们可以将所有数据连接成一个有趣的曲线。