forked from modons/LMR
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathLMR_psms.py
1793 lines (1475 loc) · 69.5 KB
/
LMR_psms.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
"""
Module: LMR_psms.py
Purpose: Module containing methods for various Proxy System Models (PSMs)
Adapted from LMR_proxy and LMR_calibrate using OOP by Andre Perkins
Originator: Andre Perkins, U. of Washington
Revisions:
- Use of new more efficient "get_distance" function to calculate the
distance between proxy sites and analysis grid points.
[R. Tardif, U. of Washington, February 2016]
- Added the "LinearPSM_TorP" class, allowing the use of
temperature-calibrated *OR* precipitation-calibrated linear PSMs.
For each proxy record to be assimilated, the selection of the PSM
(i.e. T vs P) is perfomed on the basis of the smallest MSE of
regression residuals.
[R. Tardif, U. of Washington, April 2016]
- Added the "h_interpPSM" psm class for use of isotope-enabled GCM
data as prior: Ye values are taken as the prior isotope field either
at the nearest grid pt. or as the weighted-average of values at grid
points surrounding the assimilated isotope proxy site.
[ R. Tardif, U. of Washington, June 2016 ]
- Added the "BilinearPSM" class for PSMs based on bivariate linear
regressions w/ temperature AND precipitation/PSDI as independent
variables.
[ R. Tardif, Univ. of Washington, June 2016 ]
- Added the capability of calibrating/using PSMs calibrated on the basis
of a proxy record seasonality metadata.
[ R. Tardif, Univ. of Washington, July 2016 ]
- Added the capability of objectively calibrating/using PSMs calibrated
on the basis objectively-derived seasonality.
[ R. Tardif, Univ. of Washington, December 2016 ]
- Added the "BayesRegUK37PSM" class, the forward model used in
the assimilation of alkenone uk37 proxy data. Code based on
spline coefficients provided by J. Tierney (U of Arizona).
[ R. Tardif, Univ. of Washington, January 2017 ]
- Calibration of statistical PSMs now all referenced to anomalies w.r.t.
20th century.
[ R. Tardif, Univ. of Washington, August 2017 ]
"""
import numpy as np
import logging
import os.path
import LMR_calibrate
from LMR_utils import (haversine, get_distance, smooth2D,
get_data_closest_gridpt, class_docs_fixer)
import pandas as pd
from scipy.stats import linregress
import statsmodels.formula.api as sm
from abc import ABCMeta, abstractmethod
from load_data import load_cpickle
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Needed in BayesRegUK37PSM class
#import matlab.engine # for old matlab implementation
from scipy.io import loadmat
import scipy.interpolate as interpolate
# Logging output utility, configuration controlled by driver
logger = logging.getLogger(__name__)
class BasePSM(metaclass=ABCMeta):
"""
Proxy system model.
Parameters
----------
config: LMR_config
Configuration module used for current LMR run.
proxy_obj: BaseProxyObject like
Proxy object that this PSM is being attached to
psm_kwargs: dict (unpacked)
Specfic arguments for the target PSM
"""
def __init__(self, config, proxy_obj, **psm_kwargs):
pass
@abstractmethod
def psm(self, prior_obj):
"""
Maps a given state to observations for the given proxy
Parameters
----------
prior_obj BasePriorObject
Prior to be mapped to observation space (Ye).
Returns
-------
Ye:
Equivalent observation from prior
"""
pass
@abstractmethod
def error(self):
"""
Error model for given PSM.
"""
pass
@staticmethod
@abstractmethod
def get_kwargs(config):
"""
Returns keyword arguments required for instantiation of given PSM.
Parameters
----------
config: LMR_config
Configuration module used for current LMR run.
Returns
-------
kwargs: dict
Keyword arguments for given PSM
"""
pass
@class_docs_fixer
class LinearPSM(BasePSM):
"""
PSM based on linear regression.
Attributes
----------
lat: float
Latitude of associated proxy site
lon: float
Longitude of associated proxy site
elev: float
Elevation/depth of proxy sitex
corr: float
Correlation of proxy record against calibration data
slope: float
Linear regression slope of proxy/calibration fit
intercept: float
Linear regression y-intercept of proxy/calibration fit
R: float
Mean-squared error of proxy/calibration fit
Parameters
----------
config: LMR_config
Configuration module used for current LMR run.
proxy_obj: BaseProxyObject like
Proxy object that this PSM is being attached to
psm_data: dict
Pre-calibrated PSM dictionary containing current associated
proxy site's calibration information
Raises
------
ValueError
If PSM is below critical correlation threshold.
"""
def __init__(self, config, proxy_obj, psm_data=None, calib_obj=None):
self.psm_key = 'linear'
proxy = proxy_obj.type
site = proxy_obj.id
r_crit = config.psm.linear.psm_r_crit
self.lat = proxy_obj.lat
self.lon = proxy_obj.lon
self.elev = proxy_obj.elev
self.datatag_calib = config.psm.linear.datatag_calib
self.avgPeriod = config.psm.linear.avgPeriod
# Very crude assignment of sensitivity
# TODO: more inclusive & flexible way of doing this
if self.datatag_calib in ['GPCC', 'DaiPDSI']:
self.sensitivity = 'moisture'
else:
self.sensitivity = 'temperature'
try:
# Try using pre-calibrated psm_data
if psm_data is None:
psm_data = self._load_psm_data(config)
psm_site_data = psm_data[(proxy, site)]
self.corr = psm_site_data['PSMcorrel']
self.slope = psm_site_data['PSMslope']
self.intercept = psm_site_data['PSMintercept']
self.R = psm_site_data['PSMmse']
# check if seasonality defined in the psm data
# if it is, return as an attribute
if 'Seasonality' in list(psm_site_data.keys()):
self.seasonality = psm_site_data['Seasonality']
except KeyError as e:
raise ValueError('Proxy in database but not found in pre-calibration file... '
'Skipping: {}'.format(proxy_obj.id))
except IOError as e:
# No precalibration found, have to do it for this proxy
print('No pre-calibration file found for'
' {} ({}) ... calibrating ...'.format(proxy_obj.id,
proxy_obj.type))
# check if calibration object already exists
if calib_obj is None:
#print 'calibration object does not exist ...creating it...'
# TODO: Fix call and Calib Module
datag_calib = config.psm.linear.datatag_calib
calib_obj= LMR_calibrate.calibration_assignment(datag_calib)
calib_obj.datadir_calib = config.psm.linear.datadir_calib
calib_obj.read_calibration()
self.calibrate(calib_obj, proxy_obj)
# Raise exception if critical correlation value not met
if abs(self.corr) < r_crit:
raise ValueError(('Proxy model correlation ({:.2f}) does not meet '
'critical threshold ({:.2f}).'
).format(self.corr, r_crit))
# TODO: Ideally prior state info and coordinates should all be in single obj
def psm(self, Xb, X_state_info, X_coords):
"""
Maps a given state to observations for the given proxy
Parameters
----------
Xb: ndarray
State vector to be mapped into observation space (stateDim x ensDim)
X_state_info: dict
Information pertaining to variables in the state vector
X_coords: ndarray
Coordinates for the state vector (stateDim x 2)
Returns
-------
Ye:
Equivalent observation from prior
"""
# ----------------------
# Calculate the Ye's ...
# ----------------------
# Associate state variable with PSM calibration dataset
# TODO: possible calibration sources hard coded for now... should define associations at config level
if self.datatag_calib in ['GISTEMP', 'MLOST', 'NOAAGlobalTemp', 'HadCRUT', 'BerkeleyEarth']:
# temperature
state_var = 'tas_sfc_Amon'
elif self.datatag_calib in ['GPCC','DaiPDSI']:
# moisture
if self.datatag_calib == 'GPCC':
state_var = 'pr_sfc_Amon'
elif self.datatag_calib == 'DaiPDSI':
state_var = 'scpdsi_sfc_Amon'
else:
raise KeyError('Unrecognized moisture calibration source.'
' State variable not identified for Ye calculation.')
else:
raise KeyError('Unrecognized calibration-state variable association.'
' State variable not identified for Ye calculation.')
if state_var not in X_state_info.keys():
raise KeyError('Needed variable not in state vector for Ye'
' calculation.')
# TODO: end index should already be +1, more pythonic
tas_startidx, tas_endidx = X_state_info[state_var]['pos']
ind_lon = X_state_info[state_var]['spacecoords'].index('lon')
ind_lat = X_state_info[state_var]['spacecoords'].index('lat')
X_lons = X_coords[tas_startidx:(tas_endidx+1), ind_lon]
X_lats = X_coords[tas_startidx:(tas_endidx+1), ind_lat]
tas_data = Xb[tas_startidx:(tas_endidx+1), :]
gridpoint_data = self.get_close_grid_point_data(tas_data.T,
X_lons,
X_lats)
Ye = self.basic_psm(gridpoint_data)
return Ye
def basic_psm(self, data):
"""
A PSM that doesn't need to do the state unpacking steps...
Parameters
----------
data: ndarray
Data to be used in the psm calculation of estimated observations
(Ye)
Returns
-------
Ye: ndarray
Estimated observations from the proxy system model
"""
return self.slope * data + self.intercept
def get_close_grid_point_data(self, data, lon, lat):
"""
Extracts data along the sampling dimension that is closest to the
grid point (lat/lon) of the current PSM object.
Parameters
----------
data: ndarray
Gridded data matching dimensions of (sample, lat, lon) or
(sample, lat*lon).
lon: ndarray
Longitudes pertaining to the input data. Can have shape as a single
vector (lat), a grid (lat, lon), or a flattened grid (lat*lon).
lat: ndarray
Latitudes pertaining to the input data. Can have shape as a single
vector (lon), a grid (lat, lon), or a flattened grid (lat*lon).
Returns
-------
tmp_dat: ndarray
Grid point data closes to the lat/lon of the current PSM object
"""
lonshp = lon.shape
latshp = lat.shape
# If not equal we got a single vector?
if lonshp != latshp:
lon, lat = np.meshgrid(lon, lat)
if len(lon.shape) > 1:
lon = lon.ravel()
lat = lat.ravel()
# Calculate distance
dist = haversine(self.lon, self.lat, lon, lat)
if len(dist) in data.shape:
loc_idx = dist.argmin()
tmp_dat = data[..., loc_idx]
else:
# TODO: This is not general lat/lon being swapped, OK for now...
min_dist_lat_idx, \
min_dist_lon_idx = np.unravel_index(dist.argmin(), data.shape[-2:])
tmp_dat = data[..., min_dist_lat_idx, min_dist_lon_idx]
return tmp_dat
# Define the error model for this proxy
@staticmethod
def error():
return 0.1
# TODO: Simplify a lot of the actions in the calibration
def calibrate(self, C, proxy, diag_output=False, diag_output_figs=False):
"""
Calibrate given proxy record against observation data and set relevant
PSM attributes.
Parameters
----------
C: calibration_master like
Calibration object containing data, time, lat, lon info
proxy: BaseProxyObject like
Proxy object to fit to the calibration data
diag_output, diag_output_figs: bool, optional
Diagnostic output flags for calibration method
"""
calib_spatial_avg = False
Npts = 9 # nb of neighboring pts used in smoothing
#print 'Calibrating: ', '{:25}'.format(proxy.id), '{:35}'.format(proxy.type)
# --------------------------------------------
# Use linear model (regression) as default PSM
# --------------------------------------------
nbmaxnan = 0
# Look for indices of calibration grid point closest in space (in 2D)
# to proxy site
dist = get_distance(proxy.lon, proxy.lat, C.lon, C.lat)
# indices of nearest grid pt.
jind, kind = np.unravel_index(dist.argmin(), dist.shape)
if calib_spatial_avg:
C2Dsmooth = np.zeros(
[C.time.shape[0], C.lat.shape[0], C.lon.shape[0]])
for m in range(C.time.shape[0]):
C2Dsmooth[m, :, :] = smooth2D(C.temp_anomaly[m, :, :], n=Npts)
calvals = C2Dsmooth[:, jind, kind]
else:
calvals = C.temp_anomaly[:, jind, kind]
# -------------------------------------------------------
# Calculate averages of calibration data over appropriate
# intervals (annual or according to proxy seasonality)
# -------------------------------------------------------
if self.avgPeriod == 'annual':
# Simply use annual averages
avgMonths = [1,2,3,4,5,6,7,8,9,10,11,12]
elif 'season' in self.avgPeriod:
# Consider the seasonality of the proxy record
avgMonths = proxy.seasonality
else:
print('ERROR: Unrecognized value for avgPeriod! Exiting!')
exit(1)
nbmonths = len(avgMonths)
cyears = np.asarray(list(set([C.time[k].year for k in range(len(C.time))]))) # 'set' is used to get unique values
nbcyears = len(cyears)
reg_x = np.zeros(shape=[nbcyears])
reg_x[:] = np.nan # initialize with nan's
for i in range(nbcyears):
# monthly data from current year
indsyr = [j for j,v in enumerate(C.time) if v.year == cyears[i] and v.month in avgMonths]
# check if data from previous year is to be included
indsyrm1 = []
if any(m < 0 for m in avgMonths):
year_before = [abs(m) for m in avgMonths if m < 0]
indsyrm1 = [j for j,v in enumerate(C.time) if v.year == cyears[i]-1. and v.month in year_before]
# check if data from following year is to be included
indsyrp1 = []
if any(m > 12 for m in avgMonths):
year_follow = [m-12 for m in avgMonths if m > 12]
indsyrp1 = [j for j,v in enumerate(C.time) if v.year == cyears[i]+1. and v.month in year_follow]
inds = indsyrm1 + indsyr + indsyrp1
if len(inds) == nbmonths: # all months are in the data
tmp = np.nanmean(calvals[inds],axis=0)
nancount = np.isnan(calvals[inds]).sum(axis=0)
if nancount > nbmaxnan: tmp = np.nan
else:
tmp = np.nan
reg_x[i] = tmp
# ------------------------
# Set-up linear regression
# ------------------------
# Use pandas DataFrame to store proxy & calibration data side-by-side
header = ['variable', 'y']
# Fill-in proxy data
df = pd.DataFrame({'time':proxy.time, 'y': proxy.values})
df.columns = header
# Add calibration data
frame = pd.DataFrame({'variable':cyears, 'Calibration':reg_x})
df = df.merge(frame, how='outer', on='variable')
col0 = df.columns[0]
df.set_index(col0, drop=True, inplace=True)
df.index.name = 'time'
df.sort_index(inplace=True)
# ensure all df entries are floats: if not, sm.ols gives garbage
df = df.astype(np.float)
# Perform the regression
try:
regress = sm.ols(formula="y ~ Calibration", data=df).fit()
# number of data points used in the regression
nobs = int(regress.nobs)
except:
nobs = 0
if nobs < 25: # skip rest if insufficient overlapping data
raise(ValueError('Insufficent observation/calibration overlap'
' to calibrate psm.'))
# START NEW (GH) 21 June 2015... RT edit June 2016
# detrend both the proxy and the calibration data
#
# RT: This code is old and not fully compatible with more recent modifications
# (use of pandas and sm.OLS for calculation of regression...)
# The following few lines of code ensures better management and compatibility
detrend_proxy = False
detrend_calib = False
standardize_proxy = False
# This block is to ensure compatibility with GH's code below
y_ok = df['y'][ df['y'].notnull()]
calib_ok = df['Calibration'][ df['Calibration'].notnull()]
time_common = np.intersect1d(y_ok.index.values, calib_ok.index.values)
reg_ya = df['y'][time_common].values
reg_xa = df['Calibration'][time_common].values
# if any of the flags above are activated, run the following code. Otherwise, just ignore it all.
if detrend_proxy or detrend_calib or standardize_proxy:
# save copies of the original data for residual estimates later
reg_xa_all = np.copy(reg_xa)
reg_ya_all = np.copy(reg_ya)
if detrend_proxy:
# proxy detrend: (1) linear regression, (2) fit, (3) detrend
xvar = list(range(len(reg_ya)))
proxy_slope, proxy_intercept, r_value, p_value, std_err = \
linregress(xvar, reg_ya)
proxy_fit = proxy_slope*np.squeeze(xvar) + proxy_intercept
reg_ya = reg_ya - proxy_fit # detrend for proxy
if detrend_calib:
# calibration detrend: (1) linear regression, (2) fit, (3) detrend
xvar = list(range(len(reg_xa)))
calib_slope, calib_intercept, r_value, p_value, std_err = \
linregress(xvar, reg_xa)
calib_fit = calib_slope*np.squeeze(xvar) + calib_intercept
reg_xa = reg_xa - calib_fit
if standardize_proxy:
print('Calib stats (x) [min, max, mean, std]:', np.nanmin(
reg_xa), np.nanmax(reg_xa), np.nanmean(reg_xa), np.nanstd(reg_xa))
print('Proxy stats (y:original) [min, max, mean, std]:', np.nanmin(
reg_ya), np.nanmax(reg_ya), np.nanmean(reg_ya), np.nanstd(reg_ya))
# standardize proxy values over period of overlap with calibration data
reg_ya = (reg_ya - np.nanmean(reg_ya))/np.nanstd(reg_ya)
print('Proxy stats (y:standardized) [min, max, mean, std]:', np.nanmin(
reg_ya), np.nanmax(reg_ya), np.nanmean(reg_ya), np.nanstd(reg_ya))
# GH: note that std_err pertains to the slope, not the residuals!!!
# Build new pandas DataFrame that contains the modified data:
# Use pandas DataFrame to store proxy & calibration data side-by-side
header = ['variable', 'y']
# Fill-in proxy data
dfmod = pd.DataFrame({'time':common_time, 'y': reg_ya})
dfmod.columns = header
# Add calibration data
frame = pd.DataFrame({'variable':common_time, 'Calibration':reg_xa})
dfmod = dfmod.merge(frame, how='outer', on='variable')
col0 = dfmod.columns[0]
dfmod.set_index(col0, drop=True, inplace=True)
dfmod.index.name = 'time'
dfmod.sort_index(inplace=True)
# Perform the regression using updated arrays
regress = sm.ols(formula="y ~ Calibration", data=dfmod).fit()
# END NEW (GH) 21 June 2015 ... RT edit June 2016
# Assign PSM calibration attributes
# Extract the needed regression parameters
self.intercept = regress.params[0]
self.slope = regress.params[1]
self.NbPts = nobs
self.corr = np.sqrt(regress.rsquared)
if self.slope < 0: self.corr = -self.corr
# Stats on fit residuals
MSE = np.mean((regress.resid) ** 2)
self.R = MSE
SSE = np.sum((regress.resid) ** 2)
self.SSE = SSE
# Model information
self.AIC = regress.aic
self.BIC = regress.bic
self.R2 = regress.rsquared
self.R2adj = regress.rsquared_adj
# Extra diagnostics
self.calib_time = time_common
self.calib_refer_values = reg_xa
self.calib_proxy_values = reg_ya
fit = self.slope * reg_xa + self.intercept
self.calib_proxy_fit = fit
if diag_output:
# Diagnostic output
print("***PSM stats:")
print(regress.summary())
if diag_output_figs:
# Figure (scatter plot w/ summary statistics)
line = self.slope * reg_xa + self.intercept
plt.plot(reg_xa, line, 'r-', reg_xa, reg_ya, 'o',
markersize=7, markerfacecolor='#5CB8E6',
markeredgecolor='black', markeredgewidth=1)
# GH: I don't know how this ran in the first place; must exploit
# some global namespace
plt.title('%s: %s' % (proxy.type, proxy.id))
plt.xlabel('Calibration data')
plt.ylabel('Proxy data')
xmin, xmax, ymin, ymax = plt.axis()
# Annotate with summary stats
ypos = ymax - 0.05 * (ymax - ymin)
xpos = xmin + 0.025 * (xmax - xmin)
plt.text(xpos, ypos, 'Nobs = %s' % str(nobs), fontsize=12,
fontweight='bold')
ypos = ypos - 0.05 * (ymax - ymin)
plt.text(xpos, ypos,
'Slope = %s' % "{:.4f}".format(self.slope),
fontsize=12, fontweight='bold')
ypos = ypos - 0.05 * (ymax - ymin)
plt.text(xpos, ypos,
'Intcpt = %s' % "{:.4f}".format(self.intercept),
fontsize=12, fontweight='bold')
ypos = ypos - 0.05 * (ymax - ymin)
plt.text(xpos, ypos, 'Corr = %s' % "{:.4f}".format(self.corr),
fontsize=12, fontweight='bold')
ypos = ypos - 0.05 * (ymax - ymin)
plt.text(xpos, ypos, 'Res.MSE = %s' % "{:.4f}".format(MSE),
fontsize=12, fontweight='bold')
plt.savefig('proxy_%s_%s_LinearModel_calib.png' % (
proxy.type.replace(" ", "_"), proxy.id),
bbox_inches='tight')
plt.close()
@staticmethod
def get_kwargs(config):
try:
psm_data = LinearPSM._load_psm_data(config)
return {'psm_data': psm_data}
except IOError as e:
print(e)
return {}
@staticmethod
def _load_psm_data(config):
"""Helper method for loading from dataframe"""
pre_calib_file = config.psm.linear.pre_calib_datafile
return load_cpickle(pre_calib_file)
class LinearPSM_TorP(LinearPSM):
"""
PSM based on linear regression w.r.t. temperature or precipitation
**Important note: This class assumes that all linear PSMs have been
pre-calibrated w.r.t. to temperature and precipitation,
using the "LinearPSM" class defined above.
No attempts at calibration are performed here!
The following code only assigns PSM linear regression
parameters to individual proxies from fits to temperature
OR precipitation/moisture according to a measure of
"goodness-of-fit" criteria (here, MSE of the resisduals).
Attributes
----------
sensitivity: string
Indicates the sensitivity (temperature vs moisture) of the proxy record
lat: float
Latitude of associated proxy site
lon: float
Longitude of associated proxy site
elev: float
Elevation/depth of proxy site
corr: float
Correlation of proxy record against calibration data
slope: float
Linear regression slope of proxy/calibration fit
intercept: float
Linear regression y-intercept of proxy/calibration fit
R: float
Mean-squared error of proxy/calibration fit
Parameters
----------
config: LMR_config
Configuration module used for current LMR run.
proxy_obj: BaseProxyObject like
Proxy object that this PSM is being attached to
psm_data: dict
Pre-calibrated PSM dictionary containing current associated
proxy site's calibration information
Raises
------
ValueError
If PSM is below critical correlation threshold.
"""
def __init__(self, config, proxy_obj, psm_data_T=None, psm_data_P=None):
self.psm_key = 'linear_TorP'
proxy = proxy_obj.type
site = proxy_obj.id
r_crit = config.psm.linear_TorP.psm_r_crit
self.lat = proxy_obj.lat
self.lon = proxy_obj.lon
self.elev = proxy_obj.elev
self.datatag_calib_T = config.psm.linear_TorP.datatag_calib_T
self.datatag_calib_P = config.psm.linear_TorP.datatag_calib_P
self.avgPeriod = config.psm.linear_TorP.avgPeriod
# Try using pre-calibrated psm_data for **temperature**
try:
if psm_data_T is None:
psm_data_T = self._load_psm_data(config,'temperature')
psm_site_data_T = psm_data_T[(proxy, site)]
except (KeyError, IOError) as e:
# No precalibration found for temperature
print(' PSM(temperature) not calibrated for:' +
str((proxy, site)))
# Try using pre-calibrated psm_data for **precipitation/moisture**
try:
if psm_data_P is None:
psm_data_P = self._load_psm_data(config,'moisture')
psm_site_data_P = psm_data_P[(proxy, site)]
except (KeyError, IOError) as e:
# No precalibration found for moisture
print(' PSM(moisture) not calibrated for:' +
str((proxy, site)))
# Check if PSM is calibrated w.r.t. temperature and/or w.r.t.
# precipitation/moisture. Assign proxy "sensitivity" based on
# PSM calibration.
# First check if required PSM calibration data is available
psm_ok = True
if not psm_data_T:
print ('In "LinearPSM_TorP" class: Cannot find PSM calibration data for temperature.')
psm_ok = False
if not psm_data_P:
print ('In "LinearPSM_TorP" class: Cannot find PSM calibration data for moisture.')
psm_ok = False
if not psm_ok:
print('Exiting! You must use the PSMbuild facility to generate the appropriate calibrated PSMs')
raise SystemExit(1)
# Now check about temperature vs. moisture ...
if (proxy, site) in psm_data_T.keys() and (proxy, site) in psm_data_P.keys():
# calibrated for temperature AND for precipitation, check relative goodness of fit
# and assign "sensitivity" & corresponding PSM parameters according to best fit.x
# ... based on PSM R^2
if abs(psm_site_data_T['PSMcorrel']) >= abs(psm_site_data_P['PSMcorrel']):
# better fit w.r.t. temperature
self.sensitivity = 'temperature'
self.corr = psm_site_data_T['PSMcorrel']
self.slope = psm_site_data_T['PSMslope']
self.intercept = psm_site_data_T['PSMintercept']
self.R = psm_site_data_T['PSMmse']
self.seasonality = psm_site_data_T['Seasonality']
else:
# better fit w.r.t. precipitation/moisture
self.sensitivity = 'moisture'
self.corr = psm_site_data_P['PSMcorrel']
self.slope = psm_site_data_P['PSMslope']
self.intercept = psm_site_data_P['PSMintercept']
self.R = psm_site_data_P['PSMmse']
self.seasonality = psm_site_data_P['Seasonality']
elif (proxy, site) in psm_data_T.keys() and (proxy, site) not in psm_data_P.keys():
# calibrated for temperature and NOT for precipitation, assign as "temperature" sensitive
self.sensitivity = 'temperature'
self.corr = psm_site_data_T['PSMcorrel']
self.slope = psm_site_data_T['PSMslope']
self.intercept = psm_site_data_T['PSMintercept']
self.R = psm_site_data_T['PSMmse']
self.seasonality = psm_site_data_T['Seasonality']
elif (proxy, site) not in psm_data_T.keys() and (proxy, site) in psm_data_P.keys():
# calibrated for precipitation/moisture and NOT for temperature, assign as "moisture" sensitive
self.sensitivity = 'moisture'
self.corr = psm_site_data_P['PSMcorrel']
self.slope = psm_site_data_P['PSMslope']
self.intercept = psm_site_data_P['PSMintercept']
self.R = psm_site_data_P['PSMmse']
self.seasonality = psm_site_data_P['Seasonality']
else:
raise ValueError('Proxy in database but not found in pre-calibration files... '
'Skipping: {}'.format(proxy_obj.id))
# Raise exception if critical correlation value not met
if abs(self.corr) < r_crit:
raise ValueError(('Proxy model correlation ({:.2f}) does not meet '
'critical threshold ({:.2f}).'
).format(self.corr, r_crit))
# TODO: Ideally prior state info and coordinates should all be in single obj
def psm(self, Xb, X_state_info, X_coords):
"""
Maps a given state to observations for the given proxy
Parameters
----------
Xb: ndarray
State vector to be mapped into observation space (stateDim x ensDim)
X_state_info: dict
Information pertaining to variables in the state vector
X_coords: ndarray
Coordinates for the state vector (stateDim x 2)
Returns
-------
Ye:
Equivalent observation from prior
"""
# ----------------------
# Calculate the Ye's ...
# ----------------------
if self.sensitivity:
# "sensitivity" defined for this proxy
if self.sensitivity == 'temperature':
state_var = 'tas_sfc_Amon'
elif self.sensitivity == 'moisture':
# To consider different moisture variables
if self.datatag_calib_P == 'GPCC':
state_var = 'pr_sfc_Amon'
elif self.datatag_calib_P == 'DaiPDSI':
state_var = 'scpdsi_sfc_Amon'
else:
raise KeyError('Unrecognized calibration-state variable association.'
' State variable not identified for Ye calculation.')
else:
raise KeyError('Unkown PSM sensitivity. PSM not identified for Ye'
' calculation.')
else:
# "sensitivity" not defined for this proxy: simply revert to the default (temperature)
state_var = 'tas_sfc_Amon'
if state_var not in X_state_info.keys():
raise KeyError('Needed variable not in state vector for Ye'
' calculation.')
# TODO: end index should already be +1, more pythonic
calibvar_startidx, calibvar_endidx = X_state_info[state_var]['pos']
ind_lon = X_state_info[state_var]['spacecoords'].index('lon')
ind_lat = X_state_info[state_var]['spacecoords'].index('lat')
X_lons = X_coords[calibvar_startidx:(calibvar_endidx+1), ind_lon]
X_lats = X_coords[calibvar_startidx:(calibvar_endidx+1), ind_lat]
# Extract data from closest grid point to location of proxy site
# [self.lat,self.lon]
calibvar_data = Xb[calibvar_startidx:(calibvar_endidx+1), :]
# get_close_grid_point_data is inherited from the LinearPSM class
# on which this one is based.
gridpoint_data = self.get_close_grid_point_data(calibvar_data.T,
X_lons,
X_lats)
Ye = self.slope * gridpoint_data + self.intercept
return Ye
# Define the error model for this proxy
@staticmethod
def error():
return 0.1
# TODO: Simplify a lot of the actions in the calibration
def calibrate(self, C, proxy, diag_output=False, diag_output_figs=False):
"""
Calibrate given proxy record against observation data and set relevant
PSM attributes.
Parameters
----------
C: calibration_master like
Calibration object containing data, time, lat, lon info
proxy: BaseProxyObject like
Proxy object to fit to the calibration data
diag_output, diag_output_figs: bool, optional
Diagnostic output flags for calibration method
"""
print('Calibration not performed in this psm class!')
pass
@staticmethod
def get_kwargs(config):
try:
psm_data_T = LinearPSM_TorP._load_psm_data(config,calib_var='temperature')
psm_data_P = LinearPSM_TorP._load_psm_data(config,calib_var='moisture')
return {'psm_data_T': psm_data_T, 'psm_data_P': psm_data_P}
except IOError as e:
print(e)
return {}
@staticmethod
def _load_psm_data(config,calib_var):
"""Helper method for loading from dataframe"""
if calib_var == 'temperature':
pre_calib_file = config.psm.linear_TorP.pre_calib_datafile_T
elif calib_var == 'moisture':
pre_calib_file = config.psm.linear_TorP.pre_calib_datafile_P
else:
raise ValueError('Unrecognized calibration variable'
' to calibrate psm.')
return load_cpickle(pre_calib_file)
class BilinearPSM(BasePSM):
"""
PSM based on bivariate linear regression w.r.t. temperature AND precipitation/moisture
Attributes
----------
lat: float
Latitude of associated proxy site
lon: float
Longitude of associated proxy site
elev: float
Elevation/depth of proxy site
corr: float
Correlation of proxy record against calibration data
slope: float
Linear regression slope of proxy/calibration fit
intercept: float
Linear regression y-intercept of proxy/calibration fit
R: float
Mean-squared error of proxy/calibration fit
Parameters
----------
config: LMR_config
Configuration module used for current LMR run.
proxy_obj: BaseProxyObject like
Proxy object that this PSM is being attached to
psm_data: dict
Pre-calibrated PSM dictionary containing current associated
proxy site's calibration information
Raises
------
ValueError
If PSM is below critical correlation threshold.
"""
def __init__(self, config, proxy_obj, psm_data=None, calib_obj_T=None, calib_obj_P=None):
self.psm_key = 'bilinear'
proxy = proxy_obj.type
site = proxy_obj.id
r_crit = config.psm.bilinear.psm_r_crit
self.lat = proxy_obj.lat
self.lon = proxy_obj.lon
self.elev = proxy_obj.elev
self.avgPeriod = config.psm.bilinear.avgPeriod
# Assign sensitivity as temperature_moisture
self.sensitivity = 'temperature_moisture'
# Try using pre-calibrated psm_data
try:
if psm_data is None:
psm_data = self._load_psm_data(config)
psm_site_data = psm_data[(proxy, site)]
self.corr = psm_site_data['PSMcorrel']
self.slope_temperature = psm_site_data['PSMslope_temperature']
self.slope_moisture = psm_site_data['PSMslope_moisture']
self.intercept = psm_site_data['PSMintercept']
self.R = psm_site_data['PSMmse']
self.calib_temperature = psm_site_data['calib_temperature']
self.calib_moisture = psm_site_data['calib_moisture']
# check if seasonality defined in the psm data
# if it is, return as an attribute of psm object
if 'Seasonality' in psm_site_data.keys():
self.seasonality = psm_site_data['Seasonality']
except KeyError as e:
raise ValueError('Proxy in database but not found in pre-calibration file... '
'Skipping: {}'.format(proxy_obj.id))
except IOError as e:
# No precalibration found, have to do it for this proxy
print('No pre-calibration found for'
' {} ({}) ... calibrating ...'.format(proxy_obj.id,
proxy_obj.type))