[![Node.js CI](https://github.com/Borewit/strtok3/actions/workflows/ci.yml/badge.svg)](https://github.com/Borewit/strtok3/actions/workflows/ci.yml) [![CodeQL](https://github.com/Borewit/strtok3/actions/workflows/codeql.yml/badge.svg?branch=master)](https://github.com/Borewit/strtok3/actions/workflows/codeql.yml) [![NPM version](https://badge.fury.io/js/strtok3.svg)](https://npmjs.org/package/strtok3) [![npm downloads](http://img.shields.io/npm/dm/strtok3.svg)](https://npmcharts.com/compare/strtok3,token-types?start=1200&interval=30) [![DeepScan grade](https://deepscan.io/api/teams/5165/projects/8526/branches/103329/badge/grade.svg)](https://deepscan.io/dashboard#view=project&tid=5165&pid=8526&bid=103329) [![Known Vulnerabilities](https://snyk.io/test/github/Borewit/strtok3/badge.svg?targetFile=package.json)](https://snyk.io/test/github/Borewit/strtok3?targetFile=package.json) [![Codacy Badge](https://api.codacy.com/project/badge/Grade/59dd6795e61949fb97066ca52e6097ef)](https://www.codacy.com/app/Borewit/strtok3?utm_source=github.com&utm_medium=referral&utm_content=Borewit/strtok3&utm_campaign=Badge_Grade) # strtok3 A promise based streaming [*tokenizer*](#tokenizer-object) for [Node.js](http://nodejs.org) and browsers. The `strtok3` module provides several methods for creating a [*tokenizer*](#tokenizer-object) from various input sources. Designed for: * Seamless support in streaming environments. * Efficiently decode binary data, strings, and numbers. * Reading [predefined](https://github.com/Borewit/token-types) or custom tokens. * Offering [*tokenizers*](#tokenizer-object) for reading from [files](#method-strtok3fromfile), [streams](#fromstream-function) or [Uint8Arrays](#frombuffer-function). ### Features `strtok3` can read from: * Files, using a file path as input. * Node.js [streams](https://nodejs.org/api/stream.html). * [Buffer](https://nodejs.org/api/buffer.html) or [Uint8Array](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array). * HTTP chunked transfer provided by [@tokenizer/http](https://github.com/Borewit/tokenizer-http). * [Amazon S3](https://aws.amazon.com/s3) chunks with [@tokenizer/s3](https://github.com/Borewit/tokenizer-s3). ## Installation ```sh npm install strtok3 ``` ### Compatibility Starting with version 7, the module has migrated from [CommonJS](https://en.wikipedia.org/wiki/CommonJS) to [pure ECMAScript Module (ESM)](https://gist.github.com/sindresorhus/a39789f98801d908bbc7ff3ecc99d99c). The distributed JavaScript codebase is compliant with the [ECMAScript 2020 (11th Edition)](https://en.wikipedia.org/wiki/ECMAScript_version_history#11th_Edition_%E2%80%93_ECMAScript_2020) standard. Requires a modern browser, Node.js (V8) ≥ 18 engine or Bun (JavaScriptCore) ≥ 1.2. For TypeScript CommonJs backward compatibility, you can use [load-esm](https://github.com/Borewit/load-esm). > [!NOTE] > This module requires a [Node.js ≥ 16](https://nodejs.org/en/about/previous-releases) engine. > It can also be used in a browser environment when bundled with a module bundler. ## Support the Project If you find this project useful and would like to support its development, consider sponsoring or contributing: - [Become a sponsor to Borewit](https://github.com/sponsors/Borewit) - Buy me a coffee: Buy me A coffee ## API Documentation ### strtok3 methods Use one of the methods to instantiate an [*abstract tokenizer*](#tokenizer-object): - [fromFile](#fromfile-function)* - [fromStream](#fromstream-function)* - [fromWebStream](#fromwebstream-function) - [fromBuffer](#frombuffer-function) > **_NOTE:_** * `fromFile` and `fromStream` only available when importing this module with Node.js All methods return a [`Tokenizer`](#tokenizer-object), either directly or via a promise. #### `fromFile` function Creates a [*tokenizer*](#tokenizer-object) from a local file. ```ts function fromFile(sourceFilePath: string): Promise ``` | Parameter | Type | Description | |----------------|----------|----------------------------| | sourceFilePath | `string` | Path to file to read from | > [!NOTE] > - Only available for Node.js engines > - `fromFile` automatically embeds [file-information](#file-information) Returns, via a promise, a [*tokenizer*](#tokenizer-object) which can be used to parse a file. ```js import * as strtok3 from 'strtok3'; import * as Token from 'token-types'; (async () => { const tokenizer = await strtok3.fromFile("somefile.bin"); try { const myNumber = await tokenizer.readToken(Token.UINT8); console.log(`My number: ${myNumber}`); } finally { tokenizer.close(); // Close the file } })(); ``` #### `fromStream` function Creates a [*tokenizer*](#tokenizer-object) from a Node.js [readable stream](https://nodejs.org/api/stream.html#stream_class_stream_readable). ```ts function fromStream(stream: Readable, options?: ITokenizerOptions): Promise ``` | Parameter | Optional | Type | Description | |-----------|-----------|-------------------------|--------------------------| | stream | no | [Readable](https://nodejs.org/api/stream.html#stream_class_stream_readable) | Stream to read from | | fileInfo | yes | [IFileInfo](#IFileInfo) | Provide file information | Returns a Promise providing a [*tokenizer*](#tokenizer-object). > [!NOTE] > - Only available for Node.js engines #### `fromWebStream` function Creates [*tokenizer*](#tokenizer-object) from a [WHATWG ReadableStream](https://nodejs.org/api/webstreams.html#web-streams-api). ```ts function fromWebStream(webStream: AnyWebByteStream, options?: ITokenizerOptions): ReadStreamTokenizer ``` | Parameter | Optional | Type | Description | |----------------|-----------|--------------------------------------------------------------------------|------------------------------------| | readableStream | no | [ReadableStream](https://nodejs.org/api/webstreams.html#web-streams-api) | WHATWG ReadableStream to read from | | fileInfo | yes | [IFileInfo](#IFileInfo) | Provide file information | Returns a Promise providing a [*tokenizer*](#tokenizer-object) ```js import strtok3 from 'strtok3'; import * as Token from 'token-types'; strtok3.fromWebStream(readableStream).then(tokenizer => { return tokenizer.readToken(Token.UINT8).then(myUint8Number => { console.log(`My number: ${myUint8Number}`); }); }); ``` #### `fromBuffer()` function Create a tokenizer from memory ([Uint8Array](https://nodejs.org/api/buffer.html)). ```ts function fromBuffer(uint8Array: Uint8Array, options?: ITokenizerOptions): BufferTokenizer ``` | Parameter | Optional | Type | Description | |------------|----------|--------------------------------------------------|----------------------------------------| | uint8Array | no | [Uint8Array](https://nodejs.org/api/buffer.html) | Uint8Array or Buffer to read from | | fileInfo | yes | [IFileInfo](#IFileInfo) | Provide file information | Returns a Promise providing a [*tokenizer*](#tokenizer-object). ```js import * as strtok3 from 'strtok3'; const tokenizer = strtok3.fromBuffer(buffer); tokenizer.readToken(Token.UINT8).then(myUint8Number => { console.log(`My number: ${myUint8Number}`); }); ``` ### `Tokenizer` object The *tokenizer* is an abstraction of a [stream](https://nodejs.org/api/stream.html), file or [Uint8Array](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array), allowing _reading_ or _peeking_ from the stream. It can also be translated in chunked reads, as done in [@tokenizer/http](https://github.com/Borewit/tokenizer-http); #### Key Features: - Supports seeking within the stream using `tokenizer.ignore()`. - Offers `peek` methods to preview data without advancing the read pointer. - Maintains the read position via tokenizer.position. #### Tokenizer functions _Read_ methods advance the stream pointer, while _peek_ methods do not. There are two kind of functions: 1. *read* methods: used to read a *token* of [Buffer](https://nodejs.org/api/buffer.html) from the [*tokenizer*](#tokenizer-object). The position of the *tokenizer-stream* will advance with the size of the token. 2. *peek* methods: same as the read, but it will *not* advance the pointer. It allows to read (peek) ahead. #### `readBuffer` function Read data from the _tokenizer_ into provided "buffer" (`Uint8Array`). `readBuffer(buffer, options?)` ```ts readBuffer(buffer: Uint8Array, options?: IReadChunkOptions): Promise; ``` | Parameter | Type | Description | |------------|----------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | buffer | [Buffer](https://nodejs.org/api/buffer.html) | Uint8Array | Target buffer to write the data read to | | options | [IReadChunkOptions](#ireadchunkoptions) | An integer specifying the number of bytes to read | Return promise with number of bytes read. The number of bytes read maybe if less, *mayBeLess* flag was set. #### `peekBuffer` function Peek (read ahead), from [*tokenizer*](#tokenizer-object), into the buffer without advancing the stream pointer. ```ts peekBuffer(uint8Array: Uint8Array, options?: IReadChunkOptions): Promise; ``` | Parameter | Type | Description | |------------|-----------------------------------------|-----------------------------------------------------| | buffer | Buffer | Uint8Array | Target buffer to write the data read (peeked) to. | | options | [IReadChunkOptions](#ireadchunkoptions) | An integer specifying the number of bytes to read. | | Return value `Promise` Promise with number of bytes read. The number of bytes read maybe if less, *mayBeLess* flag was set. #### `readToken` function Read a *token* from the tokenizer-stream. ```ts readToken(token: IGetToken, position: number = this.position): Promise ``` | Parameter | Type | Description | |------------|-------------------------|---------------------------------------------------------------------------------------------------------------------- | | token | [IGetToken](#IGetToken) | Token to read from the tokenizer-stream. | | position? | number | Offset where to begin reading within the file. If position is null, data will be read from the current file position. | Return value `Promise`. Promise with number of bytes read. The number of bytes read maybe if less, *mayBeLess* flag was set. #### `peek` function Peek a *token* from the [*tokenizer*](#tokenizer-object). ```ts peekToken(token: IGetToken, position: number = this.position): Promise ``` | Parameter | Type | Description | |------------|----------------------------|-------------------------------------------------------------------------------------------------------------------------| | token | [IGetToken](#IGetToken) | Token to read from the tokenizer-stream. | | position? | number | Offset where to begin reading within the file. If position is null, data will be read from the current file position. | Return a promise with the token value peeked from the [*tokenizer*](#tokenizer-object). #### `readNumber` function Peek a numeric [*token*](#token) from the [*tokenizer*](#tokenizer-object). ```ts readNumber(token: IToken): Promise ``` | Parameter | Type | Description | |------------|---------------------------------|----------------------------------------------------| | token | [IGetToken](#IGetToken) | Numeric token to read from the tokenizer-stream. | Returns a promise with the decoded numeric value from the *tokenizer-stream*. #### `ignore` function Advance the offset pointer with the token number of bytes provided. ```ts ignore(length: number): Promise ``` | Parameter | Type | Description | |------------|--------|----------------------------------------------------------------------| | ignore | number | Numeric of bytes to ignore. Will advance the `tokenizer.position` | Returns a promise with the decoded numeric value from the *tokenizer-stream*. #### `close` function Clean up resources, such as closing a file pointer if applicable. #### `Tokenizer` attributes - `fileInfo` Optional attribute describing the file information, see [IFileInfo](#IFileInfo) - `position` Pointer to the current position in the [*tokenizer*](#tokenizer-object) stream. If a *position* is provided to a _read_ or _peek_ method, is should be, at least, equal or greater than this value. ### `IReadChunkOptions` interface Each attribute is optional: | Attribute | Type | Description | |-----------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | length | number | Requested number of bytes to read. | | position | number | Position where to peek from the file. If position is null, data will be read from the [current file position](#attribute-tokenizerposition). Position may not be less then [tokenizer.position](#attribute-tokenizerposition) | | mayBeLess | boolean | If and only if set, will not throw an EOF error if less then the requested *mayBeLess* could be read. | Example usage: ```js tokenizer.peekBuffer(buffer, {mayBeLess: true}); ``` ### `IFileInfo` interface Provides optional metadata about the file being tokenized. | Attribute | Type | Description | |-----------|---------|---------------------------------------------------------------------------------------------------| | size | number | File size in bytes | | mimeType | number | [MIME-type](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types) of file. | | path | number | File path | | url | boolean | File URL | ### `Token` object The *token* is basically a description what to read form the [*tokenizer-stream*](#tokenizer-object). A basic set of *token types* can be found here: [*token-types*](https://github.com/Borewit/token-types). A token is something which implements the following interface: ```ts export interface IGetToken { /** * Length in bytes of encoded value */ len: number; /** * Decode value from buffer at offset * @param buf Buffer to read the decoded value from * @param off Decode offset */ get(buf: Uint8Array, off: number): T; } ``` The *tokenizer* reads `token.len` bytes from the *tokenizer-stream* into a Buffer. The `token.get` will be called with the Buffer. `token.get` is responsible for conversion from the buffer to the desired output type. ### Working with Web-API readable stream To convert a [Web-API readable stream](https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamDefaultReader) into a [Node.js readable stream]((https://nodejs.org/api/stream.html#stream_readable_streams)), you can use [readable-web-to-node-stream](https://github.com/Borewit/readable-web-to-node-stream) to convert one in another. ```js import { fromWebStream } strtok3 from 'strtok3'; import { ReadableWebToNodeStream } from 'readable-web-to-node-stream'; (async () => { const response = await fetch(url); const readableWebStream = response.body; // Web-API readable stream const webStream = new ReadableWebToNodeStream(readableWebStream); // convert to Node.js readable stream const tokenizer = fromWebStream(webStream); // And we now have tokenizer in a web environment })(); ``` ## Dependencies Dependencies: - [@tokenizer/token](https://github.com/Borewit/tokenizer-token): Provides token definitions and utilities used by `strtok3` for interpreting binary data. ## Licence This project is licensed under the [MIT License](LICENSE.txt). Feel free to use, modify, and distribute as needed.