【Pythonメモ】xarray
やりたいこと
10000 x 3の2次元データを50個作成する。
この50個のデータを3次元のdata_set
という変数へ格納したい。(10000 x 3 x 50のデータセットを作りたい)
今回は、ラベルとかそれほど必要じゃないから、別にxarray
じゃなくてnumpy
でいいと思うんだけど、後でxarray
を使う予定があるので、その練習として使う
参考サイト
コード
まずは2次元で
まずは二次元でやってみる。
以下の通りにdata
という正規分布に従う変数を作成。
config_return = 1.0 * 10 ** -4 config_vol1 = (1.0 * 10 ** -2) ** 2 config_vol2 = (1.0 * 10 ** -3) ** 2 config_vol3 = (1.0 * 10 ** -4) ** 2 return_list = [config_return] * 3 cov = np.diag([config_vol1, config_vol2, config_vol3]) data = np.random.multivariate_normal(return_list, cov, 10000)
このdata
をdata_xr
というxarray
の変数にする
data_label = ['vol-2', 'vol-3', 'vol-4'] data_set = xr.DataArray(data, dims = ['return', 'data_label'], coords = [('return', np.arange(len(data))), ('data_label', data_label)]) print(data_set)
結果
<xarray.DataArray (return: 10000, data_label: 3)> array([[ 7.54501238e-04, -1.77647931e-03, 1.02184378e-04], [-2.74377192e-03, 1.00417602e-03, 2.42646300e-04], [ 4.48305238e-03, 1.47129582e-03, 2.41999071e-04], ..., [ 3.91726943e-03, 1.44025946e-03, 1.13000387e-05], [-7.56105465e-03, -1.09542269e-03, 3.29951295e-04], [ 6.14599258e-03, -2.27408022e-04, 2.01113748e-04]]) Coordinates: * return (return) int32 0 1 2 3 4 5 6 ... 9994 9995 9996 9997 9998 9999 * data_label (data_label) <U5 'vol-2' 'vol-3' 'vol-4'
xarray
を追加する
次は、作成した2次元のデータを、3次元にどんどん積み上げていく。
data_set
という3次元に、格納していく
data_set = xr.DataArray(np.empty((data.shape[0], data.shape[1], 1)), dims = ['return', 'data_label', 'historical'], coords = {'return': np.arange(data.shape[0]), 'data_label': data_label, 'historical': ['dummy']}) data = np.random.multivariate_normal(return_list, cov, 10000) to_add = xr.DataArray(data[:,:,np.newaxis], dims = ['return', 'data_label', 'historical'], coords = {'historical': ['to_add']}) data_set = xr.concat([data_set, to_add], dim = 'historical') print(data_set)
結果
<xarray.DataArray (return: 10000, data_label: 3, historical: 2)> array([[[ 5.24898536e-269, 5.78928577e-003], [ 1.00000000e-004, -3.61129596e-004], [ 1.00000000e-004, 5.00824364e-005]], [[ 1.00000000e-004, 1.56549216e-002], [ 1.00000000e-004, 5.86555470e-004], [ 1.00000000e-004, 1.19149975e-004]], [[ 1.00000000e-004, 1.79009044e-002], [ 1.00000000e-004, 8.65462296e-004], [ 1.00000000e-004, 2.43073373e-004]], ..., [[-2.64264822e-001, -2.54264822e-003], [ 3.95930711e-001, 4.95930711e-004], [-4.72576401e-001, 5.27423599e-005]], [[-1.11405399e+000, -1.10405399e-002], [-1.07707356e+000, -9.77073563e-004], [-6.59465011e-001, 3.40534989e-005]], [[ 2.16231822e-001, 2.26231822e-003], [-3.18239202e-002, 6.81760798e-005], [ 1.74743095e-001, 1.17474310e-004]]]) Coordinates: * return (return) int64 0 1 2 3 4 5 6 ... 9994 9995 9996 9997 9998 9999 * data_label (data_label) object 'vol-2' 'vol-3' 'vol-4' * historical (historical) object 'dummy' 'to_add'
ちなみに、historical
というラベル?にdummy
を入れているのは、いれないとエラーが起きたから
エラー時のコード
data_set = xr.DataArray(np.empty((data.shape[0], data.shape[1], 1)), dims = ['return', 'data_label', 'historical'], coords = {'return': np.arange(data.shape[0]), 'data_label': data_label}) data = np.random.multivariate_normal(return_list, cov, 10000) to_add = xr.DataArray(data[:,:,np.newaxis], dims = ['return', 'data_label', 'historical'], coords = {'historical': ['to_add']}) data_set = xr.concat([data_set, to_add], dim = 'historical') print(data_set)
結果
Traceback (most recent call last): File "SFE_verification.py", line 145, in <module> data_set = xr.concat([data_set, to_add], dim = 'historical') File "C:\Users\hiroa\AppData\Roaming\Python\Python38\site-packages\xarray\core\concat.py", line 135, in concat return f(objs, dim, data_vars, coords, compat, positions, fill_value, join) File "C:\Users\hiroa\AppData\Roaming\Python\Python38\site-packages\xarray\core\concat.py", line 447, in _dataarray_concat ds = _dataset_concat( File "C:\Users\hiroa\AppData\Roaming\Python\Python38\site-packages\xarray\core\concat.py", line 403, in _dataset_concat raise ValueError( ValueError: Variables {'historical'} are coordinates in some datasets but not others.
コードまとめ
if __name__ == '__main__': # パラメータ等設定 count = 10000 data_label = ['vol-2', 'vol-3', 'vol-4'] config_return = 1.0 * 10 ** -4 config_vol1 = (1.0 * 10 ** -2) ** 2 config_vol2 = (1.0 * 10 ** -3) ** 2 config_vol3 = (1.0 * 10 ** -4) ** 2 return_list = [config_return] * 3 cov = np.diag([config_vol1, config_vol2, config_vol3]) data_set = xr.DataArray(np.empty((count, len(data_label), 1)), dims = ['return', 'data_label', 'historical'], coords = {'return': np.arange(count), 'data_label': data_label, 'historical': ['dummy']}) for i in range(50): data = np.random.multivariate_normal(return_list, cov, count) to_add = xr.DataArray(data[:,:,np.newaxis], dims = ['return', 'data_label', 'historical'], coords = {'historical': [i]}) data_set = xr.concat([data_set, to_add], dim = 'historical')