파이썬으로 만드는 샘플 데이터

샘플 데이터

선형 데이터

n=20

a = np.arange(n).reshape(4, -1); a  # 5列の行列

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
        42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,
        67, 68, 69, 70, 71, 72, 73, 74],
       [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,
        92, 93, 94, 95, 96, 97, 98, 99]])

df = pd.DataFrame(a, columns=list('abcde')); df

a
b
c
d
e

0
0
1
2
3
4

1
5
6
7
8
9

2
10
11
12
13
14

3
15
16
17
18
19

랜덤 데이터

r = np.random.randn(4, 5); r

array([[-0.37840346, -0.84591793,  0.50590263,  0.0544243 ,  0.59361247],
       [-0.2726931 , -1.74415635,  0.0199559 , -0.20695113, -1.19559455],
       [-0.59799566, -0.26810224, -0.18738038,  1.05843686,  0.72317579],
       [ 1.23389386,  1.91293041, -1.33322818,  0.78255026,  2.04737357]])

df = pd.DataFrame(r, columns=list('abcde')); df

a
b
c
d
e

0
-0.378403
-0.845918
0.505903
0.054424
0.593612

1
-0.272693
-1.744156
0.019956
-0.206951
-1.195595

2
-0.597996
-0.268102
-0.187380
1.058437
0.723176

3
1.233894
1.912930
-1.333228
0.782550
2.047374

df.plot()

<matplotlib.axes._subplots.AxesSubplot at 0x17699af2a58>

df = pd.DataFrame(np.random.randn(n,n))

plt.contourf(df, cmap='jet')

<matplotlib.contour.QuadContourSet at 0x1769a1a12b0>

등고선 표시

plt.pcolor(df, cmap='jet')

<matplotlib.collections.PolyCollection at 0x1769b1e2208>

컬러맵 표시

sin파

n=100
x = np.linspace(0, 2*np.pi, n)

s = pd.Series(np.sin(x), index=x)
s.plot()

<matplotlib.axes._subplots.AxesSubplot at 0x1769e695780>

sin파

snoise = s + 0.1 * np.random.randn(n)
sdf = pd.DataFrame({'sin wave':s, 'noise wave': snoise})
sdf.plot(color=('r', 'b'))

<matplotlib.axes._subplots.AxesSubplot at 0x1769e8586d8>

노이즈를 얹은

정규 분포

from  scipy import stats as ss

median = x[int(n/2)]  # xの中央値
g = pd.Series(ss.norm.pdf(x, loc=median), x)
g.plot()

<matplotlib.axes._subplots.AxesSubplot at 0x1769ffba128>

gnoise = g + 0.01 * np.random.randn(n)
df = pd.DataFrame({'gauss wave':g, 'noise wave': gnoise})
df.plot(color=('r', 'b'))

<matplotlib.axes._subplots.AxesSubplot at 0x1769e970828>

log 함수

median = x[int(n/2)]  # xの中央値
x1 = x + 10e-3
l = pd.Series(np.log(x1), x1)
l.plot()

<matplotlib.axes._subplots.AxesSubplot at 0x1769ffba5f8>

lnoise = l + 0.1 * np.random.randn(n)
df = pd.DataFrame({'log wave':l, 'noise wave': lnoise})
df.plot(color=('r', 'b'))

<matplotlib.axes._subplots.AxesSubplot at 0x176a00ec358>

랜덤 워크

n = 1000
se = pd.Series(np.random.randint(-1, 2, n)).cumsum()
se.plot()

<matplotlib.axes._subplots.AxesSubplot at 0x284f3c62c18>

np.random.randint(-1, 2, n)에서 (-1, 0, 1) 중 하나를 랜덤하게 n개 생성하고 cumsum()으로 쌓아 합계해 나가는 것으로 랜덤 워크를 그린다.

sma100 = se.rolling(100).mean()
ema100 = se.ewm(span=100).mean()

df = pd.DataFrame({'Chart': se,  'SMA100': sma100, 'EMA100': ema100})
df.plot(style = ['--','-','-'])

<matplotlib.axes._subplots.AxesSubplot at 0x284f3cadcc0>

단순 이동 평균선(Simple Moving Average)과 지수 이동 평균선(Exponential Moving Average)을 동시에 묘화하였다.
EMA 쪽이 SMA에 비해 일반적으로 최근의 움직임을 반영하기 쉽고, 트렌드에 추종하기 쉽다고 한다.

기사의 내용과는 관계없지만, 지금 과연 jupyter notebook로 써 md 형식으로 떨어뜨리면, qiita에 붙이는 것만으로 좋기 때문에 대단한 편.

Reference

이 문제에 관하여(파이썬으로 만드는 샘플 데이터), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/u1and0/items/0625f7a1cd9b476270bb

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다