# helix-importer

**Repository Path**: mirrors_adobe/helix-importer

## Basic Information

- **Project Name**: helix-importer
- **Description**: Foundation tools for importing website content into that can be consumed in an Helix project.
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-09-24
- **Last Updated**: 2026-03-15

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Helix Importer

Foundation tools for importing website content into that can be consumed in an Helix project.

Basic concept of the importer: for an input url, transform the DOM and convert it into a Markdown / docx file.

## Importer

An importer must extends [PageImporter](src/importer/PageImporter.js) and implement the `fetch` and `process` method. The general idea is that `fetch` receives the url to import and is responsible to return the HTML. `process` receives the corresponding Document in order to filter / rearrange / reshuffle the DOM before it gets processed by the Markdown transformer. `process` computes and defines the list of [PageImporterResource](src/importer/PageImporterResource.ts) (could be more than one), each resource being transformed as a Markdown document.

Goal of the importer is to get rid of the generic DOM elements like the header / footer, the nav... and all elements that are common to all pages in order to get the unique piece(s) of content per page.

### HTML2x helpers

[HTML2x](src/importer/HTML2x.js) methods (`HTML2md` and `HTML2docx`) are convienence methods to run an import. As input, they take:
- `URL`: URL of the page to import
- `document`: the DOM element to import - a Document object or a string (see `createDocumentFromString` for the string case)
- `transformerCfg`: object with the transformation "rules". Object can be either:
  - `{ transformDOM: ({ url, document, html, params }) => { ... return element-to-convert  }, generateDocumentPath: ({ url, document, html, params }) => { ... return path-to-target; }}` for a single mapping between one input document / one output file
  - `{ transform: ({ url, document, html, params }) => { ... return [{ element: first-element-to-convert, path: first-path-to-target }, ...]  }` for a mapping one input document / multiple output files (useful to generate multiple docx from a single web page)
- `config`: object with several config properties
  - `createDocumentFromString`: this config is required if you use the methods in a non-browser context and want to pass `document` param as string. This method receives the HTML to parse as a string and must return a Document object.
  - `setBackgroundImagesFromCSS`: set to false to disable the `background-image` inlining in the DOM.

### Importer UI

The Helix Importer has a dedicated browser UI: see https://github.com/adobe/helix-importer-ui

## Installation

```shell
npm i https://github.com/adobe/helix-importer 
```

TODO: publish npm module

## Usage

```js
import { ... } from '@adobe/helix-importer';
```