yt-dlp not properly internally handling unicode in titles.

See original GitHub issue

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I remove or skip any mandatory* field

Checklist

  • I’m reporting a bug unrelated to a specific site
  • I’ve verified that I’m running yt-dlp version 2022.10.04 (update instructions) or later (specify commit)
  • I’ve checked that all provided URLs are playable in a browser with the same IP and same login details
  • I’ve checked that all URLs and arguments with special characters are properly quoted or escaped
  • I’ve searched the bugtracker for similar issues including closed ones. DO NOT post duplicates
  • I’ve read the guidelines for opening an issue

Provide a description that is worded well enough to be understood

yt-dlp not properly internally handling unicode in titles.

Trying to download audio from a video. There is a unicode character in the title and yt-dlp errors out on this character. If the error was related to trying to open a file with an invalid character and hitting a filesystem limitation, I would not consider this a bug but it seems that it is happening internal to yt-dlp so I figure it is.

Running with --restrict-filenames bypasses this issue.

If this is expected behavior and marked as a wontfix, that’s fine (as a developer, I would attempt to fix it. Though my unicode experience is weak). This may help someone else running into the same issue.

Video is: https://www.youtube.com/watch?v=45Tqi-KYzDE

ERROR: ‘latin-1’ codec can’t encode character ‘\uff1a’ in position 24: ordinal not in range(256)

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

$ yt-dlp https://www.youtube.com/watch?v=45Tqi-KYzDE -f 139 -vU                 
[debug] Command-line config: ['https://www.youtube.com/watch?v=45Tqi-KYzDE', '-f', '139', '-vU']
[debug] Encodings: locale ISO-8859-1, fs iso8859-1, pref ISO-8859-1, out ISO-8859-1, error ISO-8859-1, screen ISO-8859-1
[debug] yt-dlp version 2022.10.04 [4e0511f] (zip)
[debug] Python 3.7.2 (CPython 64bit) - Linux-3.10.17-x86_64-AMD_A6-5400K_APU_with_Radeon-tm-_HD_Graphics-with-slackware-14.1 (glibc 2.2.5)
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
[debug] exe versions: ffmpeg 3.2.4, ffprobe 3.2.4, rtmpdump 2.4
[debug] Optional libraries: sqlite3-2.6.0
[debug] Proxy map: {}
[debug] Loaded 1690 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2022.10.04, Current version: 2022.10.04
yt-dlp is up to date (2022.10.04)
[debug] [youtube] Extracting URL: https://www.youtube.com/watch?v=45Tqi-KYzDE
[youtube] 45Tqi-KYzDE: Downloading webpage
[youtube] 45Tqi-KYzDE: Downloading android player API JSON
[youtube] 45Tqi-KYzDE: Downloading MPD manifest
[youtube] 45Tqi-KYzDE: Downloading MPD manifest
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, filesize, fs_approx, tbr, vbr, abr, asr, vext, aext, hasaud, id
[info] 45Tqi-KYzDE: Downloading 1 format(s): 139
ERROR: 'latin-1' codec can't encode character '\uff1a' in position 24: ordinal not in range(256)
Traceback (most recent call last):
  File "/usr/local/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 1477, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 1574, in __extract_info
    return self.process_ie_result(ie_result, download, extra_info)
  File "/usr/local/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 1632, in process_ie_result
    ie_result = self.process_video_result(ie_result, download=download)
  File "/usr/local/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 2733, in process_video_result
    self.process_info(new_info)
  File "/usr/local/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 3192, in process_info
    dl_filename = existing_video_file(full_filename, temp_filename)
  File "/usr/local/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 3087, in existing_video_file
    default_overwrite=False)
  File "/usr/local/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 2923, in existing_file
    existing_files = list(filter(os.path.exists, orderedSet(filepaths)))
  File "/usr/lib64/python3.7/genericpath.py", line 19, in exists
    os.stat(path)
UnicodeEncodeError: 'latin-1' codec can't encode character '\uff1a' in position 24: ordinal not in range(256)

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:19 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
Grub4Kcommented, Oct 23, 2022

The second character presented is \uff1a. : in filenames are getting replaced by \uff1a unless --restrict-filenames is provided if I understand correctly.

Is LC_CTYPE set and, if so, what value does it hold? Explicitly setting it to en_US.UTF-8 could work in that case as a workaround.

I would try touch '1:1' from within shell and open('2\uff1a2') (maybe os.stat('2\uff1a2') as well) from within python and see if those are working as expected before and after setting the variable.

0reactions
RichyTcommented, Oct 25, 2022

I think I know what you’re trying to get at. Unfortunately, I can’t devote much time to this in the immediate future but when I can, I’ll see if I can get it working.

Edit: It looks like using those escapes requires the double quotes. Presumably single quotes strings are literals like with Perl.

Edit2: Need to check something that may have a bearing on my testing. More to come.

Edit3: Having read more docs, I can see that this might be a core python issue. When I get some time, I’ll look into that.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[yt-dlp] Strange formatting of colons in directory and file names.
I'm using yt-dlp 2022.09.01 [edit] on Linux and I'm encountering a strange ... have unicode characters assuming --restrict-filenames is not ...
Read more >
yt-dlp command man page - ManKier
yt -dlp is a youtube-dl (https://github.com/ytdl-org/youtube-dl) fork based ... inside the system configuration file, the user configuration is not loaded.
Read more >
yt-dlp(1) - Arch manual pages
--no-restrict-filenames. Allow Unicode characters, "&" and spaces in filenames (default)
Read more >
dVG - River Thames Conditions
Verrekijker vogels test, Naat youtube 2015, E2210l samsung desbloquear, ... Erg 1993, Cognisoft technologies reviews, No matter what you do or what you...
Read more >
WACUP Beta Build #11096 (x86) Changelog
not currently being implemented); Changed the soundcloud stream handling to ... hung when they've not (e.g. due to a slow internet connection &/or...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found