AssertionError: Encoder and decoder variables have to be the same apart from target variable

PyTorch-Forecasting version: master
PyTorch version: 1.8.1
Python version: 3.9.4
Operating System: Manjaro Linux

Hi, I’m trying to adapt the stallion example to use a RecurrentNetwork. As long as the target and the time_varying_unknown_reals are matched, as in

target=["volume"]
time_varying_unknown_reals = ["volume"]
# OR
target=["volume", "log_volume"]
time_varying_unknown_reals = ["volume", "log_volume"]

everything is fine. When the inputs differ from the targets though, as in

target=["volume"]
time_varying_unknown_reals = ["volume", "log_volume"]

I get the following error

Traceback (most recent call last):
  File "/home/carlo/Git/pytorch-forecasting/examples/stallion_rnn.py", line 123, in <module>
    model = RecurrentNetwork.from_dataset(
  File "/home/carlo/miniconda3/lib/python3.9/site-packages/pytorch_forecasting/models/rnn/__init__.py", line 157, in from_dataset
    return super().from_dataset(
  File "/home/carlo/miniconda3/lib/python3.9/site-packages/pytorch_forecasting/models/base_model.py", line 1355, in from_dataset
    return super().from_dataset(dataset, **new_kwargs)
  File "/home/carlo/miniconda3/lib/python3.9/site-packages/pytorch_forecasting/models/base_model.py", line 1637, in from_dataset
    return super().from_dataset(dataset, **kwargs)
  File "/home/carlo/miniconda3/lib/python3.9/site-packages/pytorch_forecasting/models/base_model.py", line 907, in from_dataset
    net = cls(**kwargs)
  File "/home/carlo/miniconda3/lib/python3.9/site-packages/pytorch_forecasting/models/rnn/__init__.py", line 99, in __init__
    assert set(self.encoder_variables) - set(to_list(target)) - set(lagged_target_names) == set(
AssertionError: Encoder and decoder variables have to be the same apart from target variable

So there is no way to perform single target regression without dropping all others covariates. Is this expected or a bug?

Click to expand code!

from pathlib import Path
import pickle
import warnings

import numpy as np
import pandas as pd
from pandas.core.common import SettingWithCopyWarning
import pytorch_lightning as pl
from pytorch_lightning.callbacks import EarlyStopping, LearningRateMonitor
from pytorch_lightning.loggers import TensorBoardLogger
import torch

from pytorch_forecasting import GroupNormalizer, RecurrentNetwork, TimeSeriesDataSet
from pytorch_forecasting.data.examples import get_stallion_data
from pytorch_forecasting.metrics import MAE, RMSE, SMAPE, PoissonLoss, QuantileLoss
from pytorch_forecasting.models.temporal_fusion_transformer.tuning import optimize_hyperparameters
from pytorch_forecasting.utils import profile

warnings.simplefilter("error", category=SettingWithCopyWarning)


data = get_stallion_data()

data["month"] = data.date.dt.month.astype("str").astype("category")
data["log_volume"] = np.log(data.volume + 1e-8)

data["time_idx"] = data["date"].dt.year * 12 + data["date"].dt.month
data["time_idx"] -= data["time_idx"].min()
data["avg_volume_by_sku"] = data.groupby(["time_idx", "sku"], observed=True).volume.transform("mean")
data["avg_volume_by_agency"] = data.groupby(["time_idx", "agency"], observed=True).volume.transform("mean")
# data = data[lambda x: (x.sku == data.iloc[0]["sku"]) & (x.agency == data.iloc[0]["agency"])]
special_days = [
    "easter_day",
    "good_friday",
    "new_year",
    "christmas",
    "labor_day",
    "independence_day",
    "revolution_day_memorial",
    "regional_games",
    "fifa_u_17_world_cup",
    "football_gold_cup",
    "beer_capital",
    "music_fest",
]
data[special_days] = data[special_days].apply(lambda x: x.map({0: "", 1: x.name})).astype("category")

training_cutoff = data["time_idx"].max() - 6
max_encoder_length = 36
max_prediction_length = 6

training = TimeSeriesDataSet(
    data[lambda x: x.time_idx <= training_cutoff],
    time_idx="time_idx",
    # target="volume",
    target=[
        "volume",
        # "log_volume",
        # "industry_volume",
        # "soda_volume",
        # "avg_max_temp",
        # "avg_volume_by_agency",
        # "avg_volume_by_sku",
    ],
    group_ids=["agency", "sku"],
    min_encoder_length=max_encoder_length // 2,  # allow encoder lengths from 0 to max_prediction_length
    max_encoder_length=max_encoder_length,
    min_prediction_length=1,
    max_prediction_length=max_prediction_length,
    # static_categoricals=["agency", "sku"],
    # static_reals=["avg_population_2017", "avg_yearly_household_income_2017"],
    # time_varying_known_categoricals=["special_days", "month"],
    # variable_groups={"special_days": special_days},  # group of categorical variables can be treated as one variable
    # time_varying_known_reals=["time_idx", "price_regular", "discount_in_percent"],
    # time_varying_unknown_categoricals=[],
    time_varying_unknown_reals=[
        "volume",
        "log_volume",
        # "industry_volume",
        # "soda_volume",
        # "avg_max_temp",
        # "avg_volume_by_agency",
        # "avg_volume_by_sku",
    ],
    # target_normalizer=GroupNormalizer(
    #     groups=["agency", "sku"], transformation="softplus", center=False
    # ),  # use softplus with beta=1.0 and normalize by group
    # add_relative_time_idx=True,
    # add_target_scales=True,
    # add_encoder_length=True,
)


validation = TimeSeriesDataSet.from_dataset(training, data, predict=True, stop_randomization=True)
batch_size = 64
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=0)


# # save datasets
# training.save("training.pkl")
# validation.save("validation.pkl")

early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=10, verbose=False, mode="min")
lr_logger = LearningRateMonitor()
# logger = TensorBoardLogger(log_graph=True)

trainer = pl.Trainer(
    max_epochs=100,
    gpus=0,
    weights_summary="top",
    gradient_clip_val=0.1,
    limit_train_batches=30,
    # val_check_interval=20,
    # limit_val_batches=1,
    # fast_dev_run=True,
    # logger=logger,
    # profiler=True,
    callbacks=[lr_logger, early_stop_callback],
)


model = RecurrentNetwork.from_dataset(
    training,
    learning_rate=0.03,
    hidden_size=16,
    # attention_head_size=1,
    dropout=0.1,
    # hidden_continuous_size=8,
    # output_size=7,
    # loss=QuantileLoss(),
    # log_interval=10,
    # log_val_interval=1,
    # reduce_on_plateau_patience=3,
)
print(f"Number of parameters in network: {model.size()/1e3:.1f}k")

trainer.fit(
    model,
    train_dataloader=train_dataloader,
    val_dataloaders=val_dataloader,
)

# make a prediction on entire validation set
preds, index = model.predict(val_dataloader, return_index=True, fast_dev_run=True)

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:7 (2 by maintainers)

Top GitHub Comments

1reaction

jdb78commented, Jun 12, 2021

The targets are automatically lagged by 1 while other covariates are not. I suggest lagging manually in your dataframe (groupby()[name].shift()).

1reaction

jdb78commented, Jun 11, 2021

Most recurrent networks require known covariates apart from the target (you can lag the covariates manually to make them “known” in the future). Otherwise you need to train a different encoder and decoder (e.g. in the TFT). So this is expected behaviour.

Top Results From Across the Web

Adding covariates for DeepAR · Issue #452 - GitHub

Encoder and decoder variables have to be the same apart from target variable. You mean to say that in my case one training...

Source code for pytorch_forecasting.models.deepar

... set(lagged_target_names), "Encoder and decoder variables have to be the same apart from target variable" for targeti in to_list(target): assert ...

python - Pytorch forecasting - Assertion Error when trying to ...

It throws "AssertionError: filters should not remove entries all entries - check encoder/decoder lengths and lags" if the length of the ...

Pytorch Forecasting: Loading a custom dataset

I'm having trouble with loading custom datasets into PyTorch ... all entries - check encoder/decoder lengths and lags" AssertionError: ...

Encoder-Decoder Seq2Seq for Machine Translation

Here, the encoder RNN will take a variable-length sequence as input and ... target sequence (token labels) is fed into the decoder as...