package util
- Alphabetic
- Public
- Protected
Type Members
- class SerializableHadoopConf extends Serializable
Hadoop Configuration wrapper safe to serialize into a Spark closure or broadcast.
Value Members
- object Asn1Indexer
Builds and reads a sidecar record-offset index for BER/DER files, enabling Spark to split large files across multiple tasks.
Builds and reads a sidecar record-offset index for BER/DER files, enabling Spark to split large files across multiple tasks.
Index file format (
<original>.asn1idx):- 8-byte header: 7-byte magic "ASN1IDX" + 1-byte version (0x01)
- 8 bytes per record: big-endian Long byte offset of the record's first tag byte
Only definite-length BER (and all DER) can be indexed. Indefinite-length constructions stop the scan early and produce a partial index.
Reading efficiency
readIndexSliceuses HDFS positioned reads (pread) to binary-search the index file for the split boundaries, then reads only the matching slice sequentially. For a 100 M-record index (~800 MB), this costs ~27 pread round-trips per task plus a small sequential read — the full index is never loaded into memory. - object Asn1Inspector
Lightweight diagnostic utility — decode the first few records from a local ASN.1 file without a SparkSession.
Lightweight diagnostic utility — decode the first few records from a local ASN.1 file without a SparkSession.
Typical use: paste into a notebook or
sbt consoleto verify your options before submitting a full Spark job.import io.github.sparkasn1.spark.asn1.util.Asn1Inspector Asn1Inspector.peek( schemaPaths = Seq("/tmp/cdr.asn1"), typeName = "PGWRecord", encoding = "ber", filePath = "/tmp/sample.ber" )
Can also be run from the command line:
sbt "runMain io.github.sparkasn1.spark.asn1.util.Asn1Inspector \ --schema cdr.asn1 --type PGWRecord --encoding ber --file sample.ber" - object BerRealUtil
BER/DER encoding of ASN.1 REAL (X.690 §8.5) without BouncyCastle support.
BER/DER encoding of ASN.1 REAL (X.690 §8.5) without BouncyCastle support.
Only the binary encoding (base-2) is produced. Special values +∞/-∞ use their standardised single-byte representations. Zero maps to empty content.
- object BitUtils
Bit-level utilities shared by PER encoder/decoder (Phase 2).
- object SchemaCache
Executor-local cache of parsed SchemaRegistry instances.
Executor-local cache of parsed SchemaRegistry instances.
Keyed on the set of (path, lastModified) pairs so that schema file changes are detected between jobs without restarting executors.
Schema files must be accessible from every executor node — either on HDFS, S3, or distributed via --files / SparkContext.addFile.