AssertionError: Encoder and decoder variables have to be the same apart from target variable

See original GitHub issue
  • PyTorch-Forecasting version: master
  • PyTorch version: 1.8.1
  • Python version: 3.9.4
  • Operating System: Manjaro Linux

Hi, I’m trying to adapt the stallion example to use a RecurrentNetwork. As long as the target and the time_varying_unknown_reals are matched, as in

target=["volume"]
time_varying_unknown_reals = ["volume"]
# OR
target=["volume", "log_volume"]
time_varying_unknown_reals = ["volume", "log_volume"]

everything is fine. When the inputs differ from the targets though, as in

target=["volume"]
time_varying_unknown_reals = ["volume", "log_volume"]

I get the following error

Traceback (most recent call last):
  File "/home/carlo/Git/pytorch-forecasting/examples/stallion_rnn.py", line 123, in <module>
    model = RecurrentNetwork.from_dataset(
  File "/home/carlo/miniconda3/lib/python3.9/site-packages/pytorch_forecasting/models/rnn/__init__.py", line 157, in from_dataset
    return super().from_dataset(
  File "/home/carlo/miniconda3/lib/python3.9/site-packages/pytorch_forecasting/models/base_model.py", line 1355, in from_dataset
    return super().from_dataset(dataset, **new_kwargs)
  File "/home/carlo/miniconda3/lib/python3.9/site-packages/pytorch_forecasting/models/base_model.py", line 1637, in from_dataset
    return super().from_dataset(dataset, **kwargs)
  File "/home/carlo/miniconda3/lib/python3.9/site-packages/pytorch_forecasting/models/base_model.py", line 907, in from_dataset
    net = cls(**kwargs)
  File "/home/carlo/miniconda3/lib/python3.9/site-packages/pytorch_forecasting/models/rnn/__init__.py", line 99, in __init__
    assert set(self.encoder_variables) - set(to_list(target)) - set(lagged_target_names) == set(
AssertionError: Encoder and decoder variables have to be the same apart from target variable

So there is no way to perform single target regression without dropping all others covariates. Is this expected or a bug?

Click to expand code!
from pathlib import Path
import pickle
import warnings

import numpy as np
import pandas as pd
from pandas.core.common import SettingWithCopyWarning
import pytorch_lightning as pl
from pytorch_lightning.callbacks import EarlyStopping, LearningRateMonitor
from pytorch_lightning.loggers import TensorBoardLogger
import torch

from pytorch_forecasting import GroupNormalizer, RecurrentNetwork, TimeSeriesDataSet
from pytorch_forecasting.data.examples import get_stallion_data
from pytorch_forecasting.metrics import MAE, RMSE, SMAPE, PoissonLoss, QuantileLoss
from pytorch_forecasting.models.temporal_fusion_transformer.tuning import optimize_hyperparameters
from pytorch_forecasting.utils import profile

warnings.simplefilter("error", category=SettingWithCopyWarning)


data = get_stallion_data()

data["month"] = data.date.dt.month.astype("str").astype("category")
data["log_volume"] = np.log(data.volume + 1e-8)

data["time_idx"] = data["date"].dt.year * 12 + data["date"].dt.month
data["time_idx"] -= data["time_idx"].min()
data["avg_volume_by_sku"] = data.groupby(["time_idx", "sku"], observed=True).volume.transform("mean")
data["avg_volume_by_agency"] = data.groupby(["time_idx", "agency"], observed=True).volume.transform("mean")
# data = data[lambda x: (x.sku == data.iloc[0]["sku"]) & (x.agency == data.iloc[0]["agency"])]
special_days = [
    "easter_day",
    "good_friday",
    "new_year",
    "christmas",
    "labor_day",
    "independence_day",
    "revolution_day_memorial",
    "regional_games",
    "fifa_u_17_world_cup",
    "football_gold_cup",
    "beer_capital",
    "music_fest",
]
data[special_days] = data[special_days].apply(lambda x: x.map({0: "", 1: x.name})).astype("category")

training_cutoff = data["time_idx"].max() - 6
max_encoder_length = 36
max_prediction_length = 6

training = TimeSeriesDataSet(
    data[lambda x: x.time_idx <= training_cutoff],
    time_idx="time_idx",
    # target="volume",
    target=[
        "volume",
        # "log_volume",
        # "industry_volume",
        # "soda_volume",
        # "avg_max_temp",
        # "avg_volume_by_agency",
        # "avg_volume_by_sku",
    ],
    group_ids=["agency", "sku"],
    min_encoder_length=max_encoder_length // 2,  # allow encoder lengths from 0 to max_prediction_length
    max_encoder_length=max_encoder_length,
    min_prediction_length=1,
    max_prediction_length=max_prediction_length,
    # static_categoricals=["agency", "sku"],
    # static_reals=["avg_population_2017", "avg_yearly_household_income_2017"],
    # time_varying_known_categoricals=["special_days", "month"],
    # variable_groups={"special_days": special_days},  # group of categorical variables can be treated as one variable
    # time_varying_known_reals=["time_idx", "price_regular", "discount_in_percent"],
    # time_varying_unknown_categoricals=[],
    time_varying_unknown_reals=[
        "volume",
        "log_volume",
        # "industry_volume",
        # "soda_volume",
        # "avg_max_temp",
        # "avg_volume_by_agency",
        # "avg_volume_by_sku",
    ],
    # target_normalizer=GroupNormalizer(
    #     groups=["agency", "sku"], transformation="softplus", center=False
    # ),  # use softplus with beta=1.0 and normalize by group
    # add_relative_time_idx=True,
    # add_target_scales=True,
    # add_encoder_length=True,
)


validation = TimeSeriesDataSet.from_dataset(training, data, predict=True, stop_randomization=True)
batch_size = 64
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=0)


# # save datasets
# training.save("training.pkl")
# validation.save("validation.pkl")

early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=10, verbose=False, mode="min")
lr_logger = LearningRateMonitor()
# logger = TensorBoardLogger(log_graph=True)

trainer = pl.Trainer(
    max_epochs=100,
    gpus=0,
    weights_summary="top",
    gradient_clip_val=0.1,
    limit_train_batches=30,
    # val_check_interval=20,
    # limit_val_batches=1,
    # fast_dev_run=True,
    # logger=logger,
    # profiler=True,
    callbacks=[lr_logger, early_stop_callback],
)


model = RecurrentNetwork.from_dataset(
    training,
    learning_rate=0.03,
    hidden_size=16,
    # attention_head_size=1,
    dropout=0.1,
    # hidden_continuous_size=8,
    # output_size=7,
    # loss=QuantileLoss(),
    # log_interval=10,
    # log_val_interval=1,
    # reduce_on_plateau_patience=3,
)
print(f"Number of parameters in network: {model.size()/1e3:.1f}k")

trainer.fit(
    model,
    train_dataloader=train_dataloader,
    val_dataloaders=val_dataloader,
)

# make a prediction on entire validation set
preds, index = model.predict(val_dataloader, return_index=True, fast_dev_run=True)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
jdb78commented, Jun 12, 2021

The targets are automatically lagged by 1 while other covariates are not. I suggest lagging manually in your dataframe (groupby()[name].shift()).

1reaction
jdb78commented, Jun 11, 2021

Most recurrent networks require known covariates apart from the target (you can lag the covariates manually to make them “known” in the future). Otherwise you need to train a different encoder and decoder (e.g. in the TFT). So this is expected behaviour.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Adding covariates for DeepAR · Issue #452 - GitHub
Encoder and decoder variables have to be the same apart from target variable. You mean to say that in my case one training...
Read more >
Source code for pytorch_forecasting.models.deepar
... set(lagged_target_names), "Encoder and decoder variables have to be the same apart from target variable" for targeti in to_list(target): assert ...
Read more >
python - Pytorch forecasting - Assertion Error when trying to ...
It throws "AssertionError: filters should not remove entries all entries - check encoder/decoder lengths and lags" if the length of the ...
Read more >
Pytorch Forecasting: Loading a custom dataset
I'm having trouble with loading custom datasets into PyTorch ... all entries - check encoder/decoder lengths and lags" AssertionError: ...
Read more >
Encoder-Decoder Seq2Seq for Machine Translation
Here, the encoder RNN will take a variable-length sequence as input and ... target sequence (token labels) is fed into the decoder as...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found