2022/01/08  阅读:29  主题:红绯


Signal processing (time series analysis) for scientific data analysis with Python: Part 1用 Python 进行科学数据分析的信号处理(时间序列分析) : 第1部分

Running mean filter to a time series对时间序列运行平均滤波器

Nita GhoshNita GhoshFollowJan 27, 2021 · 5 min read

27,20215 min read

![](https://miro.medium.com/max/1400/1*8SIjyJ8SbAFS6k4eWjRmcQ.png) ! [ https://miro.medium.com/max/1400/1*8sijyj8sbafs6k4ewjrmcq.png ]
Image by [Altuna Akalin](https://al2na.co/)
图片来源: Altuna Akalin/ https://al2na.co/

These days internet is flooded with resources that help to make one’s way to leaning data science or/and machine leaning. Oftentimes, you would find a junior scientist like myself immersed in loads of data and trying to make a little sense of it (which you may also call time series analysis in fancy terms). It was only when I was scouring the internet for scientific data analysis guidelines that I noticed the paucity of resources. Thus, I thought it would be fantastic if I get to share some of the techniques that I have learnt/am learning throughout my course of data/signal analysis procedure using Python as a programming language.

如今,互联网上充斥着各种资源,这些资源有助于人们学习数据科学或者/和机器学习。通常情况下,你会发现一个像我这样的初级科学家沉浸在大量的数据中,并试图理解其中的一些意义(你也可以称之为时间序列分析的花哨术语)。只有当我在网上搜索科学数据分析指南时,我才注意到资源的匮乏。因此,我认为如果能够分享我在使用 Python 作为编程语言的数据/信号分析过程中学到的一些技术,那将是非常棒的。

Below are different fields of application for time series/signal analysis that might get benefited from this read:


  1. Scientific data analysis (spectral signal analysis for spectroscopy/bioscience)
  2. Audio signal processing, digital signal processing

3. Speech signal processing,

3. 语音信号处理,

4. Image processing

4. 图像处理

5. Financial data processing

5. 财务数据处理

6. Seismology etc.

6. 地震学等。

Because of my background as well as my future interest my discussion here will focus majorly on scientific data/signal analysis.


In this particular series (stay tuned for future parts, part 2, part 3) I wish to cover topics regarding several tools required for a signal processing. These are listed below.

在这个特殊的系列中(请继续关注后续部分,第2部分,第3部分) ,我希望讨论与信号处理所需的几个工具有关的主题。这些都列在下面。

  • Temporal Filtering 时间过滤
  • Convolution 卷积
  • Wavelet transformation 小波变换
  • Spectral analysis(Fast Fourier Transformation) 光谱分析(快速傅立叶变换)
  • Time-frequency domain analysis 时频域分析
  • Cleaning/denoising data 清理/去噪数据
  • Resampling (up and down) 重新采样(上下)
  • Interpolation and extrapolation 插值和外推
  • Feature detection 特征提取
  • Signal to noise ratio 信噪比

Running mean filter (Theory) 运行平均滤波器(理论)

In the first part today I am going to introduce you to application of a smoothing filter. It’s called running mean filter or mean smoothing filter. As the name suggests this filter works by setting each data point in the filtered signal as an average of a finite number of surrounding signal. The number of data points (k) that you go backward and forward in the is the key point of a mean-smoothing filter. That is called the order of the filter.


! [ https://miro.medium.com/freeze/max/60/1*8tkwwzv3zhexzmm7xvtpzw.gif?q=20] ! [ https://miro.medium.com/max/560/1*8tkwwzv3zhexzmm7xvtpzw.gif ]

This is the underlying equation governing the function of a mean-smoothening process. It might look cryptic but what it does is actually quite straight forward. You set each point in the filtered signal y(t) as the sum over all the data points in the original signal, x(t), going from k order backwards in t to k points forward divided by the number of time points which is 2k+1. The plus one indicates you are goin k backward and k forward from your original data point.

这是一个控制平均光滑化过程函数的基本方程。它可能看起来很神秘,但是它实际上是相当直接的。设置滤波信号 y (t)中的每个点,作为原始信号中所有数据点的和,x (t) ,从 t 的 k 次序向后到 k 次序向前,除以时间点的个数,也就是2k + 1。加1表示从原始数据点开始向后走 k,向前走 k。

! [ https://miro.medium.com/max/60/1*xa6e4mqdlscqfoocxi_bca.png?q=20] ! [ https://miro.medium.com/max/1400/1*xa6e4mqdlscqfoocxi_bca.png ]

The image above shows the function of mean filter smoothening (blue markers) on a randomly generated dataset (red markers). So the mean smoothing filter basically takes away the sharp edges of the original signal to make it more flat around the mean. The larger the order (k) is the smoother is the filtered signal. Now , you can already notice that there’s something funny happening at the edges. These are called ‘Edge Effects’. In general, you always get something bizarre happening at the edges of your time series when you apply a temporal filter. However, there are some well established method which allow you to work around that problem. I will discuss in more detail in later parts of this series.


Coding in Python 用 Python 编码

Now let me show how does the code look in python. Here I am assuming a basic level of familiarity of the readers with python.

现在让我展示一下 python 中的代码是什么样的。这里我假设读者对 python 有一个基本的熟悉程度。

Import libraries 导入库

First, let’s import the libraries that are required to run the codes.


import numpy as np  
import matplotlib.pyplot as plt  
import scipy.io as sio  
import scipy.signal  
from scipy import *  
import copy  
%matplotlib inline

Create artificial data with noise 用噪声创建人工数据

Let’s create an arbitrary time series (time as x-axis) signal. We set the sampling rate of this signal as 2000 Hz. Set time till 3 second with the interval being 1/sigRate. To create the signal we linearly interpolate accross 15 random time points. You can find more detail about the interpolation function in numpy library here%2C%20evaluated%20at%20x.&text=The%20x%2Dcoordinates%20at%20which%20to%20evaluate%20the%20interpolated%20values.).

让我们创建一个任意的时间序列(时间作为 x 轴)信号。我们将这个信号的采样率设置为2000赫兹。将时间设置为3秒,间隔为1/si篦。为了创建信号,我们对15个随机时间点进行线性插值。你可以在这里找到更多关于 numpy 库中的插值函数的细节% 2 c% 20evaluated% 20at% 20x。& text =% 20x% 2d 坐标% 20at% 20which% 20 to% 20evaluate% 20the% 20the% 20x% 2d coordinates% 20at% 20at% 20which% 20 to% 20evaluate% 20the% 20in% 20values.).

plt.rcParams["font.size"] = 16  
plt.rcParams['figure.figsize'] = (20, 10)sigRate = 1000 #Hz  
time = np.arange(0,3, 1/sigRate)  
n = len(time)  
p = 15 #poles for random interpolation  
ampl = np.interp(np.linspace(0,p,n),np.arange(0,p),np.random.rand(p)*30)plt.plot(time, ampl)

Let’s have a look at the plot already.


! [ https://miro.medium.com/max/60/1*umgyljvdhxe5fz404kqeag.png?q=20] ! [ https://miro.medium.com/max/1400/1*umgyljvdhxe5fz404kqeag.png ]

We now add arbirtary noise to the signal.


noiseamp = 5  
noise = noiseamp*np.random.randn(n)  
signal = ampl + noise  

plt.plot(time, ampl)  
plt.plot(time, signal)

This is what final signal with noise looks like:


! [ https://miro.medium.com/max/60/1*-k-bjnb0qcwv6plmarxmfa.png?q=20] ! [ https://miro.medium.com/max/1400/1*-k-bjnb0qcwv6plmarxmfa.png ]

Running mean filter or mean smoothing filter运行平均滤波器或平均平滑滤波器

One can also initialize the variable filtSig by setting zeros in which case the starting and endpoint of the filtered signal changes, the rest remains same. Next we are implementing the running-mean filter in the time domain. Keep in mind that it is also possible to apply this kind of filter in the spectral domain. I intend to discuss this more in one of the future posts. We are looping over i starting from signal[k] to signal[n-k-1]. Notice here that for a running mean function of order k, signal[k] indicates the (k+1) point in the time series. This is because the variable Final_Signal is a numpy array.

还可以通过设置零来初始化变量 filtSig,在这种情况下,滤波信号的起始和终结点发生变化,其余部分保持不变。接下来我们在时域上实现运行均值滤波器。记住,这种滤波器也可以应用于光谱领域。我打算在以后的文章中进一步讨论这个问题。我们从信号[ k ]开始循环到信号[ n-k-1]。注意,对于一个运行中的 k 阶平均值函数,signal [ k ]表示时间序列中的(k + 1)点。这是因为变量 Final _ signal 是 numpy 数组。

filtSig = np.zeros(n)k = 20  
for i in range(k,n-k-1):  
    # each point is the average of k surrounding points  
    filtSig[i] = np.mean(signal[i-k:i+k])

Plot the result


! [ https://miro.medium.com/max/60/1*vm3mcwqu1lecqyshum22_q.png?q=20] ! [ https://miro.medium.com/max/1400/1*vm3mcwqu1lecqyshum22_q.png ]

Compare over order (k= 10 to 20)比较超序(k = 10到20)

#initializingfiltSig = np.zeros(n)  
#filtSig = map(lambda i : np.mean(signal[i-k:i+k], signal))for k in range(10,20):  
    for i in range(k,n-k-1):  
        filtSig[i] = np.mean(signal[i-k:i+k])  

    plt.plot(time, filtSig)   
    plt.plot(time, signal, marker='o', alpha=0.01)
! [ https://miro.medium.com/max/60/1*ojpukeczegm4qxkgmsahdg.png?q=20] ! [ https://miro.medium.com/max/1400/1*ojpukeczegm4qxkgmsahdg.png ]

Please let me know if you have any queries regarding this part in the comment section. In the next part of the series I will talk about application of Gaussian-smooth filter and the we try to understand the differences between them.



2022/01/08  阅读:29  主题:红绯