Low level utilities.

Utilities

def split_array(data, idx): return data[:idx], data[idx:]

split_array[source]

split_array(data, idx)

r3[source]

r3(x)

return a scalar value rounded to 3dp or the values of iterable, rounded to 3dp in a new array/tuple

test_in = [3.3, 1.1, 2.02, 3.003, 4.0004]
test_result = r3(test_in)
assert test_in is not test_result
assert [3.3, 1.1, 2.02, 3.003, 4.0] == test_result
test_in = tuple(test_in)
test_result = r3(test_in)
assert test_in is not test_result
assert (3.3, 1.1, 2.02, 3.003, 4.0) == test_result

For efficiency, we need to be able to calculate standard deviations without processing all data with np.std.

By keeping track of;

  • count of items c
  • sum of items s
  • sum of items squared s2

we can calculate variance with; (s2/c) - (s/c)**2

Note: we have to clamp var at zero. When working with small numbers, they sometimes end up -ve due to numerical instability. It might be interesting to do a moving average implementation like: https://github.com/pete88b/data-science/blob/master/pytorch-things/calculating-variance.ipynb

class Aggs[source]

Aggs(y)

keeps track of c, s and s2 and provides access to score

Aggs.__init__[source]

Aggs.__init__(y)

create a new Aggs assuming you're going to iterate over y

Aggs.upd[source]

Aggs.upd(yi)

update c, s and s2 values with the next y value

Aggs.score[source]

Aggs.score()

return the sum of the standard deviation for both sides of y

def np_score(y): return np.std(y)*len(y)
y = np.linspace(-2.5, 2.5, 11)
aggs = Aggs(y)
assert aggs._c == 11
for i in range(len(y)-1): 
    aggs.upd(y[i])
    y_le, y_gt = split_array(y, i+1)
    assert len(y_le) == aggs.c
    assert r3(aggs.score()) == r3(np_score(y_le) + np_score(y_gt))
assert 14.361 == r3(aggs.score())

Loss Functions

mse[source]

mse(x, y)

rmse[source]

rmse(x, y)

# y = x + np.random.random(x.shape)
x = np.array([-1., -0.778, -0.556, -0.332, -0.115, 0.119,  0.331,  0.556,  0.777,  1.])
y = np.array([-0.721,  0.036,  0.366,  0.490,  0.565, 1.080, 0.414, 1.125, 1.483, 1.569])
assert 0.480 == np.round(mse(x, y), 3)
assert 0.693 == np.round(rmse(x, y), 3) # expect 0.6931791254791217