Guide - Setting up the Product and Catalog Export

Introduction

A workflow to export data and use them in SPARQUE AI is probably a much needed configuration for ICM. This guide shows one way to automate this workflow. For this automation, the following steps are necessary:

  • Create a product export job

  • Create a catalog export job (for each catalog)

  • Create a file destination in Azure DevOps

  • Create a transport configuration

  • Create a process chain for automatic export

  • Create all job configurations via DBInit

  • Configure SPARQUE to read from export destination

More detailed instructions can be found in the following sections.

References

Complete Example Cartridge

A cartridge containing the jobs and process chain described in this document can be found here:

Steps for Automation

Create Product Export Job

To automate the export of all products, create a job that runs the product export:

# Name of job configuration
RunProductExport.Name=RunProductExport
RunProductExport.Description=RunProductExport
#RunProductExport.Date=2010.11.01 at 00:00:00
#RunProductExport.Interval=1440
RunProductExport.PipelineName=ProcessImpexJob
RunProductExport.PipelineStartNode=Start
RunProductExport.EnableJob=true
RunProductExport.ApplicationSite=inSPIRED-Site
RunProductExport.ApplicationURLIdentifier=inTRONICS
# add custom attributes (keypair with AttributeName<Number> = AttributeValue<Number>)
RunProductExport.AttributeName1=DomainName
RunProductExport.AttributeValue1=inSPIRED-inTRONICS
RunProductExport.AttributeName2=ExportDirectory
RunProductExport.AttributeValue2=sparque
RunProductExport.AttributeName3=JobName
RunProductExport.AttributeValue3=ProcessCatalogImpex
RunProductExport.AttributeName4=ProcessPipelineName
RunProductExport.AttributeValue4=ProcessProductExport
RunProductExport.AttributeName5=ProcessPipelineStartNode
RunProductExport.AttributeValue5=Export
RunProductExport.AttributeName6=SelectedFile
RunProductExport.AttributeValue6=exportFromProcessChain.xml
RunProductExport.AttributeName7=DeterminePageablePipeline
RunProductExport.AttributeValue7=ProcessProductSearch-SimpleSearch

Create Catalog Export Job

Catalogs need to be exported separately, one export per catalog. This can also be done via a job configuration, similar to the following:

# Name of job configuration
RunCatalogCamerasExport.Name=RunCatalogCamerasExport
RunCatalogCamerasExport.Description=RunCatalogCamerasExport
#RunCatalogCamerasExport.Date=2010.11.01 at 00:00:00
#RunCatalogCamerasExport.Interval=1440
RunCatalogCamerasExport.PipelineName=ProcessImpexJob
RunCatalogCamerasExport.PipelineStartNode=Start
RunCatalogCamerasExport.EnableJob=true
RunCatalogCamerasExport.ApplicationSite=inSPIRED-Site
RunCatalogCamerasExport.ApplicationURLIdentifier=inTRONICS
# add custom attributes (keypair with AttributeName<Number> = AttributeValue<Number>)
RunCatalogCamerasExport.AttributeName1=DomainName
RunCatalogCamerasExport.AttributeValue1=inSPIRED-inTRONICS
RunCatalogCamerasExport.AttributeName2=ExportDirectory
RunCatalogCamerasExport.AttributeValue2=sparque
RunCatalogCamerasExport.AttributeName3=CatalogID
RunCatalogCamerasExport.AttributeValue3=Cameras-Camcorders
RunCatalogCamerasExport.AttributeName4=ProcessPipelineName
RunCatalogCamerasExport.AttributeValue4=ProcessCatalogExport
RunCatalogCamerasExport.AttributeName5=ProcessPipelineStartNode
RunCatalogCamerasExport.AttributeValue5=Export
RunCatalogCamerasExport.AttributeName6=SelectedFile
RunCatalogCamerasExport.AttributeValue6=exportCameras.xml

Create Container Destination in Azure DevOps

To create a file destination in Azure DevOps, perform the following steps:

  1. Go to Microsoft Azure.

  2. Create a storage account or use an existing one.

  3. Create a new container or fileshare. In this example we will call it sparque.

  4. Create an access key, it will be required in the next step.

Create Transport Configuration

For the full transport, create a transport configuration as shown below:

domain=inSPIRED-inTRONICS
process.id=SparqueTransport
process.displayname=SparqueTransport
process.type=AZURE
location.local=<path to shared file system>/sites/inSPIRED-inTRONICS-Site/units/inSPIRED-inTRONICS/impex/export/sparque
account.key=<previously created access key>
account.name=<storage account name>
file.share=<previously created container/fileshare name, e.g. blob://sparque. Important: use prefix blob:// for container or file:// for fileshare>
process.direction=PUSH
process.delete=0

The transport can then be automated using a job.

ExecuteSparqueTransport.Name=ExecuteSparqueTransport 
ExecuteSparqueTransport.Description=ExecuteSparqueTransport
#ExecuteSparqueTransport.Date=2010.11.01 at 00:00:00
#ExecuteSparqueTransport.Interval=1440
ExecuteSparqueTransport.PipelineName=FileTransportJob
ExecuteSparqueTransport.PipelineStartNode=Start
ExecuteSparqueTransport.EnableJob=true
# add custom attributes (keypair with AttributeName<Number> = AttributeValue<Number>)
ExecuteSparqueTransport.AttributeName1=TransportProcessID
ExecuteSparqueTransport.AttributeValue1=SparqueTransport

Create and Execute Process Chain for Automatic Export

The process chain contains all previous exports and the transport configuration. Timeouts should be adjusted for projects. Also, depending on the number of products/categories, it may be more efficient/faster to run the exports concurrently.

For details on all of the process chain options, see Concept - Process Chains (valid to 11.x).

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<chain xmlns="https://www.intershop.com/xml/ns/semantic/processchain/v1" name="Chain 1" timeout="90">
    <sequence name="Chain 1.1 - Sequence" timeout="90">
        <job job="RunProductExport" domain="inSPIRED-inTRONICS" name="Chain 1.1.1 - Job" timeout="60"/>
		<job job="RunCatalogCamerasExport" domain="inSPIRED-inTRONICS" name="Chain 1.1.2 - Job" timeout="60"/>
		<!--- more catalog exports i.e.<job job="RunCatalogSpecialsExport" domain="inSPIRED-inTRONICS" name="Chain 1.1.3 - Job" timeout="60"/>--->
        <job job="ExecuteSparqueTransport" domain="inSPIRED-inTRONICS" name="Chain 1.1.4 - Job" timeout="30"/>
    </sequence>
</chain>

A process chain can be triggered manually in the back office, the automation approach would be to create a job configuration for this as well:

# Name of job configuration
ExecuteSparqueProcessChain.Name=ExecuteSparqueProcessChain 
ExecuteSparqueProcessChain.Description=ExecuteSparqueProcessChain
#ExecuteSparqueProcessChain.Date=2010.11.01 at 00:00:00
#ExecuteSparqueProcessChain.Interval=1440
ExecuteSparqueProcessChain.PipelineName=ExecuteProcessChain
ExecuteSparqueProcessChain.PipelineStartNode=Start
ExecuteSparqueProcessChain.EnableJob=true
# add custom attributes (keypair with AttributeName<Number> = AttributeValue<Number>)
ExecuteSparqueProcessChain.AttributeName1=XmlFileName
ExecuteSparqueProcessChain.AttributeValue1=inSPIRED-inTRONICS-Site/units/inSPIRED-inTRONICS/impex/config/ExportAndTransportProducts.xml

Create All Job Configurations via DBInit

Job configurations and transport configurations can be created through DBInit using the PrepareTransportConfiguration and PrepareJobConfigurations preparers:

Class1500 = com.intershop.component.transport.dbinit.PrepareTransportConfiguration \
			com.intershop.demo.responsive.dbinit.data.job.TransportConfiguration
Class1505 = com.intershop.beehive.core.dbinit.preparer.job.PrepareJobConfigurations \
          inSPIRED-inTRONICS \
          com.intershop.demo.responsive.dbinit.data.job.JobConfigurations

Configure SPARQUE to Read from Export Destination

Create Shared Access Signature for Fileshare

To access the created file in the file share, a Shared Access Signature must be created.

  1. Navigate to Security + networking | Shared access signature:

    image-20240528-090154.png

    Settings:

    • Allowed services: File

    • Allowed resource types: Service, Container, Object

    • Allowed permissions: Read, List

    • Allowed IP addresses: Add if necessary

    • Define the expiry date/time - Select a proper date in the future. Make sure you refresh the signature after expiration date.

    • Signing Key: Use the same access key as above.

  2. Click on Generate SAS and connection string.

  3. Copy the string of SAS token.
    See also Microsoft | Grant limited access to Azure Storage resources using shared access signatures (SAS).

Alternatively, if the Azure Portal is not available for the task, you can create an SAS token using the Azure CLI as follows:

az storage share generate-sas --name <share-name> --account-name <storage-account-name> --permissions rl --https-only --expiry 2028-01-01T00:00Z --account-key <storage-account-key>

Example:

user@computer:~$ az storage share exists --name sharename --account-name azurestorageaccountname --account-key your_own_key==
{
"exists": false
}
user@computer:~$ az storage share create --name sharename --account-name azurestorageaccountname --account-key your_own_key==
{
"created": true
}
user@computer:~$ az storage share generate-sas --name sharename --account-name azurestorageaccountname --permissions rl --expiry 2029-01-01T00:00Z --account-key your_own_key==
"se=2029-01-01T00%3A00Z&sp=rl&sv=2021-06-08&sr=s&sig=secretkeyULZi%2BT9McrADbEBvCtRRTgK0MIumRzac%3D"

Create Shared Access Signature for Container

To create a Shared Access Signature, do the following:

  1. Navigate to Containers.

  2. Open the container.

  3. Click the three dots next to your file, and then click Generate SAS:

  1. Use the following settings:

    image-20240910-094832.png
  2. Select the time frame in which the SAS token will be available.

  3. Click Generate SAS token and URL and copy the value.

Using Azure CLI

Alternatively, you can use Azure CLI to create the token:

az storage blob generate-sas \
    --account-name $STORAGE_ACCOUNT_NAME \
    --container-name $CONTAINER_NAME \
    --name $BLOB_NAME \
    --permissions r \
    --expiry <expiry-date-time> \
    --https-only \
    --output tsv

Use File Service/Container SAS Token in Sparque

After running the export jobs and the transport configuration, the exported and transported files are located in the created file share. SPARQUE.AI can use this file share as a base for a dataset. To use this function, configure a dataset source to Fetch a file from URL and enter the path to the file share along with the access key. This allows SPARQUE.AI to fetch data from this data source.

Example fileshare: <https://<storageaccount>.file.core.windows.net/<filesharename>/<exportfile>?<SAS token>

Example blob storage: <https://<storageaccount>.blob.core.windows.net/<containername>/<exportfile>?<SAS token>

image-20240528-091310.png
Disclaimer
The information provided in the Knowledge Base may not be applicable to all systems and situations. Intershop Communications will not be liable to any party for any direct or indirect damages resulting from the use of the Customer Support section of the Intershop Corporate Web site, including, without limitation, any lost profits, business interruption, loss of programs or other data on your information handling system.
The Intershop Knowledge Portal uses only technically necessary cookies. We do not track visitors or have visitors tracked by 3rd parties. Please find further information on privacy in the Intershop Privacy Policy and Legal Notice.
Home
Knowledge Base
Product Releases
Log on to continue
This Knowledge Base document is reserved for registered customers.
Log on with your Intershop Entra ID to continue.
Write an email to supportadmin@intershop.de if you experience login issues,
or if you want to register as customer.