Parsing XML in RightNow PHP

RightNow's PHP build is different than most PHP builds you've encountered. It excludes useful extensions that are assumed to be available by most developers including:

  • SOAP extensions
  • Multibyte String extensions
  • XML Parsing extensions (SimpleXML)

Note: Expat is available in most builds, but it is a very "special" XML library that is event based instead of tree/DOM based. I personally would rather feed my fingertips to the wolverines than use it.

The focus of this article is on overcoming the lack of availability of the ubiquitous SimpleXML PHP extension. Hidden in RightNow's Customer Portal framework is a utility class that can be used for XML parsing. The SimpleHtmlDom class is technically for parsing HTML documents, but can be repurposed for XML since an HTML document is essentially XML. The class provides a very nice jQuery selector inspired method for querying the document by node name and/or by node attribute value. I won't go into detail on the usage of this class as there are a handful of good resources already available including:

The class can be found at the following location in Customer Portal's WebDAV structure:

cp/core/framework/Libraries/ThirdParty/SimpleHtmlDom.php

And can be included in a CP script by using this shortcut:

require_once CPCORE . 'Libraries/ThirdParty/SimpleHtmlDom.php';

I can use this class anywhere in RightNow PHP including inside of Customer Portal resources, or inside of a PHP script in the Custom directory. The following example will show how to use the SimpleHtmlDom class in a CP Controller. Our controller will read an XML file, parse its contents and print the results to the screen.

I've created a basic XML file named firefly.xml that represents the TV Space Western by the same name. The XML file contains some basic information about the show, and a list of episodes. We will write a CP Controller that will parse this file and print out information about the show's creator and a full list of episodes.

Here is my sample file which I'll store as cp/customer/development/libraries/firefly.xml.


<?xml version="1.0" encoding="UTF-8"?>
<firefly>
    <createdBy>Joss Whedon</createdBy>
    <genres>
        <genre>Space Western</genre>
        <genre>Drama</genre>
        <genre>Comedy-drama</genre>
    </genres>
    <seasons>
        <season>
            <number>1</number>
            <episodes>
                <episode>
                    <number>1</number>
                    <title>Serenity</title>
                </episode>
                <episode>
                    <number>2</number>
                    <title>The Train Job</title>
                </episode>
                <episode>
                    <number>3</number>
                    <title>Bushwhacked</title>
                </episode>
                <episode>
                    <number>4</number>
                    <title>Shindig</title>
                </episode>
                <episode>
                    <number>5</number>
                    <title>Safe</title>
                </episode>
                <episode>
                    <number>6</number>
                    <title>Our Mrs. Reynolds</title>
                </episode>
                <episode>
                    <number>7</number>
                    <title>Jaynestown</title>
                </episode>
                <episode>
                    <number>8</number>
                    <title>Out of Gas</title>
                </episode>
                <episode>
                    <number>9</number>
                    <title>Ariel</title>
                </episode>
                <episode>
                    <number>10</number>
                    <title>War Stories</title>
                </episode>
                <episode>
                    <number>11</number>
                    <title>Trash</title>
                </episode>
                <episode>
                    <number>12</number>
                    <title>The Message</title>
                </episode>
                <episode>
                    <number>13</number>
                    <title>Heart of Gold</title>
                </episode>
                <episode>
                    <number>14</number>
                    <title>Objects in Space</title>
                </episode>
            </episodes>
        </season>
    </seasons>
</firefly>

Next I'll create a CP Controller named XMLProcessor with a single method named process. I will access this Controller at the following URL:

https://my_site.custhelp.com/cc/XMLProcessor/process


<?php

namespace Custom\Controllers;

use \RightNow\Libraries\ThirdParty\SimpleHtmlDom;
require_once CPCORE . 'Libraries/ThirdParty/SimpleHtmlDom.php';

class XMLProcessor extends \RightNow\Controllers\Base
{
    public function process()
    {
        //Grab XML and parse it. An HTML DOM is essentially XML
        //We can pretend our XML is HTML and use SimpleHtmlDom to parse it
        $fireflyXML = file_get_contents(APPPATH . 'libraries/firefly.xml');
        $fireflyDOM = SimpleHtmlDom\str_get_html($fireflyXML);
        
        echo "<h1>Firefly</h1>";
        
        //Grab and print Series Creator
        $createdBy = $fireflyDOM->find("createdBy", 0)->innertext;
        echo sprintf("<h2>Created By: %s</h2>", $createdBy);
        
        //Print number and title from each Episode
        $episodes = $fireflyDOM->find("episodes episode");
        foreach($episodes as $episode)
        {
            $episodeNumber = $episode->find("number", 0)->innertext;
            $episodeTitle = $episode->find("title", 0)->innertext;
            
            echo sprintf("<p><strong>Number:</strong> %s <strong>Title:</strong> %s</p>", $episodeNumber, $episodeTitle);
        }
    }
}

I load an XML document by using the file_get_contents method to read a file into a string, the pass the string into the SimpleHtmlDom libraries str_get_html function.


$fireflyXML = file_get_contents(APPPATH . 'libraries/firefly.xml');
$fireflyDOM = SimpleHtmlDom\str_get_html($fireflyXML);

Once the XML is loaded, I can use the find method and a jQuery selector like syntax to query the XML document. The find method takes a selector as the first parameter, and an option second parameter that represents the index of the result set. If you know that there is only one result, you can shortcut your coding by passing 0 as the second parameter effectively returning the first (and only) result.


$createdBy = $fireflyDOM->find("createdBy", 0)->innertext;

If the second parameter is not included an array of matching elements is returned. This array can then be traversed using a for or foreach construct.


$episodes = $fireflyDOM->find("episodes episode");
foreach($episodes as $episode)
{
  ....
}

This SimpleHtmlDom class provides a lot of options for parsing XML "natively" inside of RightNow. Since this is a PHP implementation there will be some limits to the size of XML document that can be processed, but for most use cases it should be more than sufficient.

Comments

Are you sure Expat isn't available? It was available at least in the last year or so. It may require white-listing but I don't think so.

Found it. Take a look at cp/core/framework/Models/Polling.php

/**
* Returns all survey data necessary to create a poll
*
* @param int $surveyID The id of the polling survey
* @return array Survey data
*/
public function getSurveyData($surveyID) {
if (!Framework::isValidID($surveyID)) {
return $this->getResponseObject(null, null, "Invalid Survey ID: '$surveyID'");
}

$cacheKey = "Polling$surveyID";
if (!$response = Framework::checkCache($cacheKey)) {
$data = array();
list($data['flow_id'],
$data['survey_disabled'],
$data['multi_submit'],
$data['survey_type'],
$data['expiration_date'],
$data['survey_intf_id'],
$data['doc_id'],
$designXml) = Sql::getResultsBySurvey($surveyID);

// parse the design xml for options
$parser = xml_parser_create();
xml_parse_into_struct($parser, $designXml, $values, $index);
xml_parser_free($parser);
$optionsIndex = $index['OPTIONS'][0];
$attributeValueArray = $values[$optionsIndex]['attributes'];

$data['title'] = $attributeValueArray['POLLINGTITLE'];
$data['show_results_link'] = $attributeValueArray['POLLINGSHOWRESULTSLINK'] === 'true';
$data['show_total_votes'] = $attributeValueArray['POLLINGSHOWTOTALVOTES'] === 'true';
$data['show_chart'] = $attributeValueArray['POLLINGSHOWCHART'] === 'true';

$data['submit_button_label'] = \RightNow\Utils\Config::getMessage(SUBMIT_CMD);
$data['view_results_label'] = \RightNow\Utils\Config::getMessage(VIEW_RESULTS_CMD);
$data['ok_button_label'] = \RightNow\Utils\Config::getMessage(OK_CMD);
$data['turn_text'] = \RightNow\Utils\Config::getMessage(THANK_YOU_PARTICIPATING_POLL_MSG);
$data['total_votes_label'] = \RightNow\Utils\Config::getMessage(TOTAL_VOTES_LBL) . " ";

$response = $this->getResponseObject($data, 'is_array');
Framework::setCache($cacheKey, $response);
}

return $response;
}

I tried to edit, but it took me to a page saying the page didn't exist.

This code above is from:

RightNow Customer Portal 3.2.1

Software Version:Oracle RightNow CX Cloud Service February 2014 (Build 131)

I just rechecked and Expat functions are now fully included, I'll amend my article. I still find most people (including myself) hate an event based parser and would rather do regex/string comparisons than use it.

If you have some VERY basic XML, it might be an option. In general, I'd rather use a tree/DOM based parser. Since SimpleXML is not available, this is the next best option.

Hi Andy and thanks for this.
I need to use SimpleHtmlDom.php inside a custom script.
Anyway I'm unable to include the library into the script so far. Any hint?
Thanks

It should be a completely stand alone piece of PHP with no Customer Portal dependencies. Just use require_once from your script to the appropriate path. CPCORE won't be a defined constant in the custom script scope, so you'll have to figure out what that resolves to for your site and explicitly put that in your script.

Here is an article I previously wrote that shows how to view the defined CP constants and also a table of what most of the common constants look like.
http://cxdeveloper.com/article/useful-constants-customer-portal-cp3

Thanks for that link Andy, found it very useful and I got everything working here.
I was able to include the library this way:

define('CPCORE','/cgi-bin/<interface_name>.cfg/scripts/cp/core/framework/3.2.6/');
require_once( CPCORE . 'Libraries/ThirdParty/SimpleHtmlDom.php');

Now my only concern is that the framework version is hardcoded in that path and the script will stop working on framework upgrade. Any chance to get those version numbers dynamically inside a custom script?
Thanks again.

If you dig deep into some hidden CP code, you can find a file called init.php. This is where the CP startup routine defines these constants.

$documentRoot = get_cfg_var('doc_root');
define('SOURCEPATH', 'cp/core/framework/' . (IS_HOSTED ? CP_FRAMEWORK_VERSION . '.' . CP_FRAMEWORK_NANO_VERSION . '/' : ''));
define('CPCORE', $documentRoot . '/' . SOURCEPATH);

You could add similar defines to your code but i'm not sure how/where the CP_FRAMEWORK_VERSION constant is defined.

Your other choice would be to copy the SimpleHtmlDom.php file into your custom directory and reference the copy; this way you won't be surprised by any Oracle changes.

Zircon - This is a contributing Drupal Theme
Design by WeebPal.