Skip to content

mime

Classes:

  • FileSignature

    Child structure of FileSignatures, holding info of certain types of files and their signatures.

  • FileSignatures

    Structure wrapping a json file holding all file signatures.

  • FileType

    Enum for file types and mime types.

  • IndexingType

    Enum of common indexing file extensions.

  • ParsedFile

    Structure for file info.

FileSignature

Bases: NamedTuple

Child structure of FileSignatures, holding info of certain types of files and their signatures.

Methods:

Attributes:

  • ext (str) –

    Extension from the signature.

  • file_type (str) –

    FileType as a str.

  • mime (str) –

    MIME type of the signature.

  • offset (int) –

    Offset from the start of the file of the signatures.

  • signatures (list[bytes]) –

    Byte data signatures, unique for this file type.

ext instance-attribute

ext: str

Extension from the signature.

file_type instance-attribute

file_type: str

FileType as a str.

mime instance-attribute

mime: str

MIME type of the signature.

offset instance-attribute

offset: int

Offset from the start of the file of the signatures.

signatures instance-attribute

signatures: list[bytes]

Byte data signatures, unique for this file type.

check_signature

check_signature(file_bytes: bytes | bytearray, /, *, ignore: int = 0) -> int

Verify the signature of the file.

Parameters:

  • file_bytes

    (bytes | bytearray) –

    Header bytes of the file to be checked.

  • ignore

    (int, default: 0 ) –

    If a found signature is shorter than this length, it will be ignored.

Returns:

  • int

    Length of the found signature.

Source code
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
def check_signature(self, file_bytes: bytes | bytearray, /, *, ignore: int = 0) -> int:
    """
    Verify the signature of the file.

    :param file_bytes:  Header bytes of the file to be checked.
    :param ignore:      If a found signature is shorter than this length, it will be ignored.

    :return:            Length of the found signature.
    """

    max_signature_length = 0

    for signature in self.signatures:
        signature_len = len(signature)

        if signature_len < ignore or signature_len <= max_signature_length:
            continue

        if signature == file_bytes[self.offset:signature_len + self.offset]:
            max_signature_length = signature_len

    return max_signature_length

FileSignatures

FileSignatures(
    *,
    custom_header_data: str | Path | list[FileSignature] | None = None,
    force: bool = False
)

Bases: list[FileSignature]

Structure wrapping a json file holding all file signatures.

Fetch all the file signatures, optionally with added custom signatures.

Methods:

  • load_headers_data

    Load file signatures from json file. This is cached unless custom_header_data is set.

  • parse

    Parse a given file.

Attributes:

Source code
107
108
109
110
111
112
113
114
def __init__(self, *, custom_header_data: str | Path | list[FileSignature] | None = None, force: bool = False):
    """Fetch all the file signatures, optionally with added custom signatures."""

    self.extend(self.load_headers_data(custom_header_data=custom_header_data, force=force))

    self.max_signature_len = max(
        chain.from_iterable([len(signature) for signature in mime.signatures] for mime in self)
    )

file_headers_path class-attribute instance-attribute

file_headers_path = Path(
    join(dirname(abspath(__file__)), "__file_headers.json")
)

Custom path for the json containing file headers.

max_signature_len instance-attribute

max_signature_len = max(
    from_iterable([len(signature) for signature in signatures] for mime in self)
)

load_headers_data

load_headers_data(
    *,
    custom_header_data: str | Path | list[FileSignature] | None = None,
    force: bool = False
) -> list[FileSignature]

Load file signatures from json file. This is cached unless custom_header_data is set.

Parameters:

  • custom_header_data

    (str | Path | list[FileSignature] | None, default: None ) –

    Custom header data path file or custom list of already parsed FileSignature.

  • force

    (bool, default: False ) –

    Ignore cache and reload header data from disk.

Returns:

Source code
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
def load_headers_data(
    cls, *, custom_header_data: str | Path | list[FileSignature] | None = None, force: bool = False
) -> list[FileSignature]:
    """
    Load file signatures from json file. This is cached unless ``custom_header_data`` is set.

    :param custom_header_data:  Custom header data path file or custom list of already parsed FileSignature.
    :param force:               Ignore cache and reload header data from disk.

    :return:                    List of parsed FileSignature from json file.
    """

    if cls._file_headers_data is None or force or custom_header_data:
        header_data: list[dict[str, Any]] = []

        filenames = {cls.file_headers_path}

        if custom_header_data and not isinstance(custom_header_data, list):
            filenames.add(Path(custom_header_data))

        for filename in filenames:
            header_data.extend(json.loads(filename.read_text()))

        _file_headers_data = list(
            dict({
                FileSignature(
                    info['file_type'], info['ext'], info['mime'], info['offset'],
                    # This is so when checking a file head we first compare the most specific and long signatures
                    sorted([bytes.fromhex(signature) for signature in info['signatures']], reverse=True)
                ): 0 for info in header_data
            }).keys()
        )

        cls._file_headers_data = _file_headers_data

        if isinstance(custom_header_data, list):
            return custom_header_data + _file_headers_data

    return cls._file_headers_data

parse

parse(filename: Path) -> FileSignature | None

Parse a given file.

Parameters:

  • filename

    (Path) –

    Path to file.

Returns:

Source code
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
@inject_self
def parse(self, filename: Path) -> FileSignature | None:
    """
    Parse a given file.

    :param filename:        Path to file.

    :return:                The input file's mime signature.
    """

    with open(filename, 'rb') as file:
        file_bytes = file.read(self.max_signature_len)

    max_signature_len = 0
    found_signatures = list[FileSignature]()

    for mimetype in self:
        found_signature = mimetype.check_signature(file_bytes, ignore=max_signature_len)

        if not found_signature:
            continue

        if found_signature > max_signature_len:
            max_signature_len = found_signature
            found_signatures = [mimetype]
        elif found_signature == max_signature_len:
            found_signatures.append(mimetype)

    if not found_signatures:
        return None

    signature_match_ext = [
        mime for mime in found_signatures if f'.{mime.ext}' == filename.suffix
    ]

    if signature_match_ext:
        return signature_match_ext[0]

    return found_signatures[0]

FileType

Bases: FileTypeBase

Enum for file types and mime types.

Methods:

  • __call__

    Get an INDEX FileType of another FileType (Video, Audio, Other).

  • is_index

    Verify whether the FileType is an INDEX that holds its own FileType (e.g. mime: index/video).

  • parse

    Parse infos from a file. If the FileType is different than AUTO, this function will throw if the file

Attributes:

  • ARCHIVE

    File type for archive files.

  • AUDIO

    File type for audio files.

  • AUTO

    Special file type for :py:attr:FileType.parse.

  • CHAPTERS

    File type for chapters files.

  • DOCUMENT

    File type for documents.

  • FONT

    File type for font files.

  • IMAGE

    File type for image files.

  • INDEX
  • INDEX_AUDIO
  • INDEX_VIDEO
  • OTHER

    File type for generic files, like applications.

  • VIDEO

    File type for video files.

ARCHIVE class-attribute instance-attribute

ARCHIVE = 'archive'

File type for archive files.

AUDIO class-attribute instance-attribute

AUDIO = 'audio'

File type for audio files.

AUTO class-attribute instance-attribute

AUTO = 'auto'

Special file type for :py:attr:FileType.parse.

CHAPTERS class-attribute instance-attribute

CHAPTERS = 'chapters'

File type for chapters files.

DOCUMENT class-attribute instance-attribute

DOCUMENT = 'document'

File type for documents.

FONT class-attribute instance-attribute

FONT = 'font'

File type for font files.

IMAGE class-attribute instance-attribute

IMAGE = 'image'

File type for image files.

INDEX class-attribute instance-attribute

INDEX = 'index'

INDEX_AUDIO class-attribute instance-attribute

INDEX_AUDIO = f'{INDEX}_{AUDIO}'

INDEX_VIDEO class-attribute instance-attribute

INDEX_VIDEO = f'{INDEX}_{VIDEO}'

OTHER class-attribute instance-attribute

OTHER = 'other'

File type for generic files, like applications.

VIDEO class-attribute instance-attribute

VIDEO = 'video'

File type for video files.

__call__

__call__(file_type: str | FileType) -> FileTypeIndexWithType

Get an INDEX FileType of another FileType (Video, Audio, Other).

Source code
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
def __call__(self: FileTypeIndex, file_type: str | FileType) -> FileTypeIndexWithType:  # type: ignore
    """Get an INDEX FileType of another FileType (Video, Audio, Other)."""

    if self is not FileType.INDEX:
        raise NotImplementedError

    file_type = FileType(file_type)

    if file_type in {FileType.AUDIO, FileType.VIDEO}:
        if file_type is FileType.AUDIO:
            return FileType.INDEX_AUDIO  # type: ignore

        if file_type is FileType.VIDEO:
            return FileType.INDEX_VIDEO  # type: ignore

    raise CustomValueError(
        'You can only have Video, Audio or Other index file types!', str(FileType.INDEX)
    )

is_index

Verify whether the FileType is an INDEX that holds its own FileType (e.g. mime: index/video).

Source code
324
325
326
327
def is_index(self) -> TypeGuard[FileTypeIndexWithType]:  # type: ignore
    """Verify whether the FileType is an INDEX that holds its own FileType (e.g. mime: index/video)."""

    return self in {FileType.INDEX, FileType.INDEX_AUDIO, FileType.INDEX_VIDEO}  # type: ignore

parse

parse(
    path: FilePathType,
    *,
    func: FuncExceptT | None = None,
    force_ffprobe: bool | None = None
) -> ParsedFile

Parse infos from a file. If the FileType is different than AUTO, this function will throw if the file is a different FileType than what this method was called on.

:force_ffprobe: Only rely on ffprobe to parse the file info.

Parameters:

  • path

    (FilePathType) –

    Path of the file to be parsed.

  • func

    (FuncExceptT | None, default: None ) –

    Function returned for custom error handling. This should only be set by VS package developers.

Returns:

  • ParsedFile

    ParsedFile object, holding the file's info.

Source code
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
@inject_self.with_args(AUTO)
def parse(
    self, path: FilePathType, *, func: FuncExceptT | None = None, force_ffprobe: bool | None = None
) -> ParsedFile:
    """
    Parse infos from a file. If the FileType is different than AUTO, this function will throw if the file
    is a different FileType than what this method was called on.

    :param path:        Path of the file to be parsed.
    :param func:        Function returned for custom error handling.
                        This should only be set by VS package developers.
    :force_ffprobe:     Only rely on ffprobe to parse the file info.

    :return:            ParsedFile object, holding the file's info.
    """

    from .ffprobe import FFProbe, FFProbeStream

    filename = Path(str(path)).absolute()

    file_type: FileType | None = None
    mime: str | None = None
    ext: str | None = None

    header = None if force_ffprobe else FileSignatures.parse(filename)

    if header is not None:
        file_type = FileType(header.file_type)
        mime = header.mime
        ext = f'.{header.ext}'
    else:
        stream: FFProbeStream | None = None
        ffprobe = FFProbe(func=func)

        try:
            stream = ffprobe.get_stream(filename, FileType.VIDEO)

            if stream is None:
                stream = ffprobe.get_stream(filename, FileType.AUDIO)

            if not stream:
                raise CustomRuntimeError(
                    f'No usable video/audio stream found in {filename}', func
                )

            file_type = FileType(stream.codec_type)
            mime = f'{file_type.value}/{stream.codec_name}'
        except Exception as e:
            if force_ffprobe:
                raise e
            elif force_ffprobe is None:
                return self.parse(path, force_ffprobe=False)

        if stream is None:
            mime, encoding = guess_mime_type(filename)

            file_type = FileType(mime)

    if ext is None:
        ext = filename.suffix

    encoding = encodings_map.get(filename.suffix, None)

    if not file_type or not mime:
        return ParsedFile(filename, ext, encoding, FileType.OTHER, 'file/unknown')

    if self is not FileType.AUTO and self is not file_type:
        raise CustomValueError(
            'FileType mismatch! self is {good}, file is {bad}!', FileType.parse, good=self, bad=file_type
        )

    return ParsedFile(filename, ext, encoding, file_type, mime)

IndexingType

Bases: CustomStrEnum

Enum of common indexing file extensions.

Attributes:

  • DGI

    DGIndexNV index file, mostly used for interlaced/telecined content.

  • LWI

    LSMAS index file.

DGI class-attribute instance-attribute

DGI = '.dgi'

DGIndexNV index file, mostly used for interlaced/telecined content.

LWI class-attribute instance-attribute

LWI = '.lwi'

LSMAS index file.

ParsedFile

Bases: NamedTuple

Structure for file info.

Attributes:

  • encoding (str | None) –

    Present for text files.

  • ext (str) –

    Extension of the file, from the binary data, not path.

  • file_type (FileType) –

    Type of the file. It will hold other useful information.

  • mime (str) –

    Standard MIME type of the filetype.

  • path (Path) –

    Resolved path of the file.

encoding instance-attribute

encoding: str | None

Present for text files.

ext instance-attribute

ext: str

Extension of the file, from the binary data, not path.

file_type instance-attribute

file_type: FileType

Type of the file. It will hold other useful information.

mime instance-attribute

mime: str

Standard MIME type of the filetype.

path instance-attribute

path: Path

Resolved path of the file.