【Pythonメモ】xarray

やりたいこと

10000 x 3の2次元データを50個作成する。

この50個のデータを3次元のdata_setという変数へ格納したい。(10000 x 3 x 50のデータセットを作りたい)

今回は、ラベルとかそれほど必要じゃないから、別にxarrayじゃなくてnumpyでいいと思うんだけど、後でxarrayを使う予定があるので、その練習として使う

参考サイト

qiita.com

steakrecords.com

コード

まずは2次元で

まずは二次元でやってみる。

以下の通りにdataという正規分布に従う変数を作成。

    config_return = 1.0 * 10 ** -4
    config_vol1 = (1.0 * 10 ** -2) ** 2
    config_vol2 = (1.0 * 10 ** -3) ** 2
    config_vol3 = (1.0 * 10 ** -4) ** 2

    return_list = [config_return] * 3
    cov = np.diag([config_vol1, config_vol2, config_vol3])

    data = np.random.multivariate_normal(return_list, cov, 10000)

このdatadata_xrというxarrayの変数にする

    data_label = ['vol-2', 'vol-3', 'vol-4']
    data_set = xr.DataArray(data, dims = ['return', 'data_label'],
                            coords = [('return', np.arange(len(data))), ('data_label', data_label)])
    print(data_set)

結果

<xarray.DataArray (return: 10000, data_label: 3)>
array([[ 7.54501238e-04, -1.77647931e-03,  1.02184378e-04],
       [-2.74377192e-03,  1.00417602e-03,  2.42646300e-04],
       [ 4.48305238e-03,  1.47129582e-03,  2.41999071e-04],
       ...,
       [ 3.91726943e-03,  1.44025946e-03,  1.13000387e-05],
       [-7.56105465e-03, -1.09542269e-03,  3.29951295e-04],
       [ 6.14599258e-03, -2.27408022e-04,  2.01113748e-04]])
Coordinates:
  * return      (return) int32 0 1 2 3 4 5 6 ... 9994 9995 9996 9997 9998 9999
  * data_label  (data_label) <U5 'vol-2' 'vol-3' 'vol-4'

xarrayを追加する

次は、作成した2次元のデータを、3次元にどんどん積み上げていく。

data_setという3次元に、格納していく

    data_set = xr.DataArray(np.empty((data.shape[0], data.shape[1], 1)),
                            dims = ['return', 'data_label', 'historical'],
                            coords = {'return': np.arange(data.shape[0]),
                                      'data_label': data_label,
                                      'historical': ['dummy']})

    data = np.random.multivariate_normal(return_list, cov, 10000)
    to_add = xr.DataArray(data[:,:,np.newaxis],
                          dims = ['return', 'data_label', 'historical'],
                          coords = {'historical': ['to_add']})
    data_set = xr.concat([data_set, to_add], dim = 'historical')
    print(data_set)

結果

<xarray.DataArray (return: 10000, data_label: 3, historical: 2)>
array([[[ 5.24898536e-269,  5.78928577e-003],
        [ 1.00000000e-004, -3.61129596e-004],
        [ 1.00000000e-004,  5.00824364e-005]],

       [[ 1.00000000e-004,  1.56549216e-002],
        [ 1.00000000e-004,  5.86555470e-004],
        [ 1.00000000e-004,  1.19149975e-004]],

       [[ 1.00000000e-004,  1.79009044e-002],
        [ 1.00000000e-004,  8.65462296e-004],
        [ 1.00000000e-004,  2.43073373e-004]],

       ...,

       [[-2.64264822e-001, -2.54264822e-003],
        [ 3.95930711e-001,  4.95930711e-004],
        [-4.72576401e-001,  5.27423599e-005]],

       [[-1.11405399e+000, -1.10405399e-002],
        [-1.07707356e+000, -9.77073563e-004],
        [-6.59465011e-001,  3.40534989e-005]],

       [[ 2.16231822e-001,  2.26231822e-003],
        [-3.18239202e-002,  6.81760798e-005],
        [ 1.74743095e-001,  1.17474310e-004]]])
Coordinates:
  * return      (return) int64 0 1 2 3 4 5 6 ... 9994 9995 9996 9997 9998 9999
  * data_label  (data_label) object 'vol-2' 'vol-3' 'vol-4'
  * historical  (historical) object 'dummy' 'to_add'

ちなみに、historicalというラベル?にdummyを入れているのは、いれないとエラーが起きたから

エラー時のコード

    data_set = xr.DataArray(np.empty((data.shape[0], data.shape[1], 1)),
                            dims = ['return', 'data_label', 'historical'],
                            coords = {'return': np.arange(data.shape[0]),
                                      'data_label': data_label})

    data = np.random.multivariate_normal(return_list, cov, 10000)
    to_add = xr.DataArray(data[:,:,np.newaxis],
                          dims = ['return', 'data_label', 'historical'],
                          coords = {'historical': ['to_add']})
    data_set = xr.concat([data_set, to_add], dim = 'historical')
    print(data_set)

結果

Traceback (most recent call last):
  File "SFE_verification.py", line 145, in <module>
    data_set = xr.concat([data_set, to_add], dim = 'historical')
  File "C:\Users\hiroa\AppData\Roaming\Python\Python38\site-packages\xarray\core\concat.py", line 135, in concat
    return f(objs, dim, data_vars, coords, compat, positions, fill_value, join)
  File "C:\Users\hiroa\AppData\Roaming\Python\Python38\site-packages\xarray\core\concat.py", line 447, in _dataarray_concat
    ds = _dataset_concat(
  File "C:\Users\hiroa\AppData\Roaming\Python\Python38\site-packages\xarray\core\concat.py", line 403, in _dataset_concat
    raise ValueError(
ValueError: Variables {'historical'} are coordinates in some datasets but not others.

コードまとめ

if __name__ == '__main__':

    #  パラメータ等設定
    count = 10000
    data_label = ['vol-2', 'vol-3', 'vol-4']
    config_return = 1.0 * 10 ** -4
    config_vol1 = (1.0 * 10 ** -2) ** 2
    config_vol2 = (1.0 * 10 ** -3) ** 2
    config_vol3 = (1.0 * 10 ** -4) ** 2

    return_list = [config_return] * 3
    cov = np.diag([config_vol1, config_vol2, config_vol3])

    data_set = xr.DataArray(np.empty((count, len(data_label), 1)),
                            dims = ['return', 'data_label', 'historical'],
                            coords = {'return': np.arange(count),
                                      'data_label': data_label,
                                      'historical': ['dummy']})

    for i in range(50):
        
        data = np.random.multivariate_normal(return_list, cov, count)
        to_add = xr.DataArray(data[:,:,np.newaxis],
                              dims = ['return', 'data_label', 'historical'],
                              coords = {'historical': [i]})
        data_set = xr.concat([data_set, to_add], dim = 'historical')