ArrowInvalid: Could not convert <PIL.Image.Image image mode=RGB when adding image to Dataset

See original GitHub issue

Describe the bug

When adding a Pillow image to an existing Dataset on the hub, add_item fails due to the Pillow image not being automatically converted into the Image feature.

Steps to reproduce the bug

from datasets import load_dataset
from PIL import Image

dataset = load_dataset("hf-internal-testing/example-documents")

# load any random Pillow image
image = Image.open("/content/cord_example.png").convert("RGB")

new_image = {'image': image}
dataset['test'] = dataset['test'].add_item(new_image)

Expected results

The image should be automatically casted to the Image feature when using add_item. For now, this can be fixed by using encode_example:

import datasets

feature = datasets.Image(decode=False)
new_image = {'image': feature.encode_example(image)}
dataset['test'] = dataset['test'].add_item(new_image)

Actual results

ArrowInvalid: Could not convert <PIL.Image.Image image mode=RGB size=576x864 at 0x7F7CCC4589D0> with type Image: did not recognize Python value type when inferring an Arrow data type

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
mariosaskocommented, Aug 19, 2022

Hi @darraghdog! No PR yet, but I plan to fix this before the next release.

3reactions
NielsRoggecommented, Aug 12, 2022

@mariosasko I’m getting a similar issue when creating a Dataset from a Pandas dataframe, like so:

from datasets import Dataset, Features, Image, Value
import pandas as pd
import requests
import PIL

# we need to define the features ourselves
features = Features({
    'a': Value(dtype='int32'),
    'b': Image(),
})

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = PIL.Image.open(requests.get(url, stream=True).raw)

df = pd.DataFrame({"a": [1, 2], 
                   "b": [image, image]})

dataset = Dataset.from_pandas(df, features=features) 

results in

ArrowInvalid: ('Could not convert <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=640x480 at 0x7F7991A15C10> with type JpegImageFile: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column b with type object')

Will the PR linked above also fix that?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Could not convert with type Image: did not recognize Python ...
I encountered the same thing. The problem is that you need to cast the Pillow image back to the Image feature of the...
Read more >
How to add new image to existing dataset?
Let's say you have a dataset on the hub, containing some images: from ... ArrowInvalid: Could not convert <PIL.Image.Image image mode=RGB ...
Read more >
Image Module - Pillow (PIL Fork) 9.3.0 documentation
This means that changes to the original buffer object are reflected in this image). Not all modes can share memory; supported modes include...
Read more >
How to Convert images to NumPy array? - GeeksforGeeks
In Machine Learning, Python uses the image data in the format of Height, Width, Channel format. i.e. Images are converted into Numpy Array...
Read more >
After upgrade to the latest version now this error id showing up ...
ArrowInvalid : ('Could not convert int64 with type numpy.dtype: did not recognize Python value type when inferring an Arrow data type', ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found