Inference tensors cannot be saved for backward.
See original GitHub issueI don’t know if this is a bug. I’m trying to follow the official tutorial using pytorch engine.
Here is my code and exception.
String modelUrl = "https://resources.djl.ai/test-models/traced_distilbert_wikipedia_uncased.zip";
Criteria<NDList, NDList> criteria = Criteria.builder()
.optApplication(Application.NLP.WORD_EMBEDDING)
.setTypes(NDList.class, NDList.class)
.optModelUrls(modelUrl)
.optProgress(new ProgressBar())
.build();
ZooModel<NDList, NDList> embedding = criteria.loadModel();
Predictor<NDList, NDList> embedder = embedding.newPredictor();
SequentialBlock classifier = new SequentialBlock()
.add(
ndList -> {
NDArray data = ndList.singletonOrThrow();
long batchSize = data.getShape().get(0);
long maxLen = data.getShape().get(1);
NDList inputs = new NDList();
inputs.add(data.toType(DataType.INT64, false));
inputs.add(data.getManager().full(data.getShape(), 1, DataType.INT64));
inputs.add(data.getManager().arange(maxLen).toType(DataType.INT64, false).broadcast(data.getShape()));
try {
return embedder.predict(inputs);
} catch (TranslateException e) {
throw new RuntimeException(e);
}
}
)
.add(Linear.builder().setUnits(768).build())
.add(Activation::relu)
.add(Dropout.builder().optRate(0.2f).build())
.add(Linear.builder().setUnits(5).build())
.addSingleton(nd -> nd.get(":,0"));
Model model = Model.newInstance("review_classification");
model.setBlock(classifier);
DefaultVocabulary vocabulary = DefaultVocabulary.builder()
.addFromTextFile(embedding.getArtifact("vocab.txt"))
.optUnknownToken("[UNK]")
.build();
int maxTokenLen = 64;
int batchSize = 8;
int limit = Integer.MAX_VALUE;
BertFullTokenizer tokenizer = new BertFullTokenizer(vocabulary, true);
CsvDataset awsDataset = getDataset(batchSize, tokenizer, maxTokenLen, limit);
RandomAccessDataset[] datasets = awsDataset.randomSplit(7, 3);
RandomAccessDataset trainDataset = datasets[0];
RandomAccessDataset evalDataset = datasets[1];
SaveModelTrainingListener listener = new SaveModelTrainingListener("build/model");
listener.setSaveModelCallback(
trainer -> {
TrainingResult result = trainer.getTrainingResult();
Model trainerModel = trainer.getModel();
float acc = result.getValidateEvaluation("Accuracy");
trainerModel.setProperty("Accuracy", String.format("%.5f", acc));
trainerModel.setProperty("Loss", String.format("%.5f", result.getValidateLoss()));
}
);
DefaultTrainingConfig trainingConfig = new DefaultTrainingConfig(Loss.softmaxCrossEntropyLoss())
.addEvaluator(new Accuracy())
.addTrainingListeners(TrainingListener.Defaults.logging("build/model"))
.addTrainingListeners(listener);
int epoch = 2;
Trainer trainer = model.newTrainer(trainingConfig);
trainer.setMetrics(new Metrics());
Shape shape = new Shape(batchSize, maxTokenLen);
trainer.initialize(shape);
EasyTrain.fit(trainer, epoch, trainDataset, evalDataset);
System.out.println(trainer.getTrainingResult());
model.save(Paths.get("build/model"), "aws-review-rank");
[main] INFO ai.djl.pytorch.engine.PtEngine - Number of inter-op threads is 6
[main] INFO ai.djl.pytorch.engine.PtEngine - Number of intra-op threads is 6
[main] INFO ai.djl.training.listener.LoggingTrainingListener - Training on: cpu().
[main] INFO ai.djl.training.listener.LoggingTrainingListener - Load PyTorch Engine Version 1.12.1 in 0.079 ms.
Exception in thread "main" ai.djl.engine.EngineException: Inference tensors cannot be saved for backward. To work around you can make a clone to get a normal tensor and use it in autograd.
at ai.djl.pytorch.jni.PyTorchLibrary.torchNNLinear(Native Method)
at ai.djl.pytorch.jni.JniUtils.linear(JniUtils.java:1189)
at ai.djl.pytorch.engine.PtNDArrayEx.linear(PtNDArrayEx.java:390)
at ai.djl.nn.core.Linear.linear(Linear.java:183)
at ai.djl.nn.core.Linear.forwardInternal(Linear.java:88)
at ai.djl.nn.AbstractBaseBlock.forwardInternal(AbstractBaseBlock.java:126)
at ai.djl.nn.AbstractBaseBlock.forward(AbstractBaseBlock.java:91)
at ai.djl.nn.SequentialBlock.forwardInternal(SequentialBlock.java:209)
at ai.djl.nn.AbstractBaseBlock.forward(AbstractBaseBlock.java:91)
at ai.djl.training.Trainer.forward(Trainer.java:175)
at ai.djl.training.EasyTrain.trainSplit(EasyTrain.java:122)
at ai.djl.training.EasyTrain.trainBatch(EasyTrain.java:110)
at ai.djl.training.EasyTrain.fit(EasyTrain.java:58)
at cn.amberdata.misc.djl.rankcls.Main.main(Main.java:114)
And here is my dependencies.
<!-- https://mvnrepository.com/artifact/ai.djl/api -->
<dependency>
<groupId>ai.djl</groupId>
<artifactId>api</artifactId>
<version>0.19.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.slf4j/slf4j-simple -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.36</version>
</dependency>
<!-- https://mvnrepository.com/artifact/ai.djl.pytorch/pytorch-engine -->
<dependency>
<groupId>ai.djl.pytorch</groupId>
<artifactId>pytorch-engine</artifactId>
<version>0.19.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/ai.djl/basicdataset -->
<dependency>
<groupId>ai.djl</groupId>
<artifactId>basicdataset</artifactId>
<version>0.19.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/ai.djl/model-zoo -->
<dependency>
<groupId>ai.djl</groupId>
<artifactId>model-zoo</artifactId>
<version>0.19.0</version>
</dependency>
Issue Analytics
- State:
- Created 10 months ago
- Comments:17 (8 by maintainers)
Top Results From Across the Web
PyTorch on Twitter: "4. ⚠️ Inference tensors can't be used ...
⚠️ Inference tensors can't be modified in-place outside InferenceMode. ✓ Simply clone the inference tensor and you're good to go.
Read more >pytorch/saved_variable.cpp at master - autograd - GitHub
You can't save an inference tensor for backwards. // If an inference tensor was saved for backward in an autograd session and. //...
Read more >Inference Mode — PyTorch master documentation
A non-view tensor is an inference tensor if and only if it was allocated inside ... e.g. PyTorch throws an error when tensors...
Read more >python - Getting pytorch backward's RuntimeError: Trying to ...
run_backward ( RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed.
Read more >NVIDIA Deep Learning TensorRT Documentation
Where possible, the parser is backward compatible up to opset 7; the ONNX Model Opset ... outputDims = dims; } // Saved dimensions...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks for pointing out this bug to us - this is an issue in the NoopTranslator when using the PyTorch engine. We will take a look and determine the best fix.
For now, you can adjust your code as follows to get around the exception:
I’m glad to know that the answers helped!
By the way, the newest snapshot version is not in the 0.20.0 released version. It is accessed by
Currently, the
<UPCOMING VERSION> = 0.21.0. See https://github.com/deepjavalibrary/djl/blob/master/docs/get.md#using-built-from-source-version-in-another-project