numpy : 단일 유형의 ndarray를 structured array로 변환하고 싶습니다.

18191 단어 파이썬 numpy

이것은 무엇인가

단일의 형태로 정의된 통상의 ndarray 를 structured array 로 변환하는 방법입니다.

동기

여러 데이터가 시계열이며 (2 차원 배열) key를 사용하여 사전처럼 액세스하고 싶어졌습니다

numpy만으로 완결시키고 싶다

record array 를 사용하면 할 수 있다 ( Stack Overflow : Converting a 2D numpy array to a structured array ) 라고 하는 기사도 있었지만, recarray 는 numpy 의 후방 호환 때문에 현재도 남아 있지만 structured array 가 보다 새로운 ( Stack Overflow : NumPy “record array” or “structured array” or “recarray” ) 라고 하는 논의도 그렇기 때문에 structured array로 실현하기로 결정합니다

방법 1 : 튜플 목록을 통해

numpy 의 문서에는, 데이터가 tuple 의 리스트로 건네주도록(듯이) 지시되고 있습니다

Numpy : Structured array : Assignment from Python Native Types (Tuples) .

코드 예

예를 들면 다음과 같습니다.

import numpy

# d1, d2, d3 の 3 つのデータがあるとします
d1 = numpy.arange(0, 1000, dtype='int32')
d2 = numpy.arange(1000, 2000, dtype='int32')
d3 = numpy.arange(2000, 3000, dtype='int32')

# くっつけます
d = numpy.array([d1, d2, d3]).T

# d はこんな感じです
# array([[   0, 1000, 2000],
#        [   1, 1001, 2001],
#        [   2, 1002, 2002],
#        ...,
#        [ 997, 1997, 2997],
#        [ 998, 1998, 2998],
#        [ 999, 1999, 2999]], dtype=int32)

# dtype を定義しときます
dtype1 = [
    ('d1', 'int32'),
    ('d2', 'int32'),
    ('d3', 'int32'),
]

# structured array に変換します
sa1 = numpy.array(list(map(tuple, d)), dtype=dtype1)

# sa1 はこんな感じ
# array([(  0, 1000, 2000), (  1, 1001, 2001), (  2, 1002, 2002),
#        (  3, 1003, 2003), (  4, 1004, 2004), (  5, 1005, 2005),
#        (  6, 1006, 2006), (  7, 1007, 2007), (  8, 1008, 2008),
#        ...
#        (993, 1993, 2993), (994, 1994, 2994), (995, 1995, 2995),
#        (996, 1996, 2996), (997, 1997, 2997), (998, 1998, 2998),
#        (999, 1999, 2999)],
#        dtype=[('d1', '<i4'), ('d2', '<i4'), ('d3', '<i4')])

# 個別のデータに key でアクセスできるようになりました
sa1['d1']
# array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
#         13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
#         ...
#        975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987,
#        988, 989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999],
#       dtype=int32)

공연

numpy.ndarray에서 tuple로 변환하고 약간 느립니다.

수중 환경에서는 3ms 정도 걸렸습니다

%time numpy.array(list(map(tuple, d)), dtype=dtype1)
# CPU times: user 2.63 ms, sys: 0 ns, total: 2.63 ms
# Wall time: 2.64 ms

방법 2 : 버퍼를 통한

더 빨리하고 싶습니다

structured array 는 numpy.frombuffer() 를 사용하면, 바이너리 파일을 그대로 열어 매우 편리합니다.

바이너리 데이터 처리 방법은 어느 쪽이 빠른가 - Kitsune Gadget

Stack Overflow : Reading in numpy array from buffer with different data types without copying array

ndarray 메모리의 데이터를 tobytes ()로 검색하고 frombuffer ()로 재해석하면 빠르지 않을 것입니다.

코드 예

예를 들면 다음과 같습니다.

import numpy

# data を作ります
d1 = numpy.arange(0, 1000, dtype='int32')
d2 = numpy.arange(1000, 2000, dtype='int32')
d3 = numpy.arange(2000, 3000, dtype='int32')

d = numpy.array([d1, d2, d3]).T

# dtype を定義します
dtype1 = [
    ('d1', 'int32'),
    ('d2', 'int32'),
    ('d3', 'int32'),
]

### ここまでは、方法 1 と同じです ###

# structured array に変換します
sa2 = numpy.frombuffer(d.tobytes(), dtype=dtype1)

# sa1 と sa2 の値は全く同じです
all(sa2 == sa1)
# >> True

공연

수중 환경에서 80 us가되었습니다

튜플 경유에 비해 30 배 정도 빠릅니다

대만족입니다

%time numpy.frombuffer(d.tobytes(), dtype=dtype1)
# CPU times: user 75 µs, sys: 0 ns, total: 75 µs
# Wall time: 83.9 µs

조금 더 비교

방법 1과 방법 2의 계산 시간을 측정해 보았습니다.

데이터 점수 n을 변경합니다

각 점수에 대해 100 회 계산하여 소요 시간을 평균했습니다

코드

import numpy
import time

dtype1 = [
    ('d1', 'int32'),
    ('d2', 'int32'),
    ('d3', 'int32'),
]

def run(num, func):
    d = numpy.arange(num*3, dtype='int32').reshape((3, num)).T
    t0 = time.time()
    [func(d) for i in range(100)]
    t1 = time.time()
    return (t1 - t0) / 100

func1 = lambda x: numpy.array(list(map(tuple, x)), dtype=dtype1)
func2 = lambda x: numpy.frombuffer(x.tobytes(), dtype=dtype1)

# 計測します
nums = numpy.logspace(2, 5, 10, dtype=int)
t1 = [run(i, func1) for i in nums]
t2 = [run(i, func2) for i in nums]

# プロットします
import matplotlib.pyplot

fig = matplotlib.pyplot.figure()
ax = fig.add_subplot(111, aspect=1)
ax.plot(nums, t1, 'o-', label='tuple')
ax.plot(nums, t2, 'o-', label='bytes')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('# of data points')
ax.set_ylabel('Calculation time [s]')
ax.grid(True, color='#555555')
ax.grid(True, which='minor', linestyle=':', color='#aaaaaa')
ax.legend()
fig.savefig('results.png', dpi=200)

Reference

이 문제에 관하여(numpy : 단일 유형의 ndarray를 structured array로 변환하고 싶습니다.), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/nishimuraatsushi/items/5591a44db7f34aebbf8e

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

【비보】모두가 빠지는 `np.arange()`의 함정

데이터 전처리편~위성 화상 데이터와 심층 학습에 의한 호수의 엽록소 농도 추정~

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다