Information management, software architecture, and data privacy

Breaking boundaries: How Freightos achieved high speed graph search in the cloud, CloudTech, 2016.

Running heavy-duty graph algorithms against a very large dataset require some unusual design principles. Freightos may not be the only company doing it, but no cloud platform today is optimized for this; in fact, the usual design assumptions in cloud platforms are quite the opposite of what we needed. Here is how we did it.

Search and You Shall Find, Medium, 2015.

Today’s e-tail search  engines return inaccurate results; merchants stuff all product information into long titles. To optimize revenue, online retailers need a search engine that understands the product selection.

“Documents in the cloud: Dynamic, Privacy-customized views,” Cloudbook, 2012.

As documents move to the cloud, it becomes harder to protect the private information in them, but on the other hand becomes easier to control distribution of specific private information to exactly the people who are authorized to see it.

“Flexible, Dynamic Redaction”, MasterDataManagement.com, 2012.

Complying with privacy regulations used to mean “redaction,” blacking out words with a pen, slowly and expensively. But natural language processing techniques can protect exactly the information regulated by law while giving convenient access to authorized users.

People Who Live in Glass Houses Should Put Up Some Shades,” InfoSecurity, 2011.

Too much openness, as well as too little, both pose risks. Document viewing with automated privacy control is one part of the balance. Allowing authorized users to retrieve the redacted information is another.

IBM Optim Data Redaction: Reconciling Openness with Privacy,” IBM White Paper, 2010.

The White Paper for the product which I launched in IBM.

Mining for Meaning: Discovering Business Realities in Mainframe Metadata,” Mainframe Executive, Sep./Oct. 2008.

To expose siloed mainframe functionality now locked up in siloed systems, it is essential to understand its business value. Automated classification technologies help make this happen.

The Hub and the Edge: Balancing the Responsibilities,” Architecture and Governance Magazine, May 2007.

Architecture is as much about the organization as about technology. This article explains how to divide responsibilities in Service Oriented Integration to let each team do what they do best.

The Portal as People-Centric SOA,” MainSoft Corporation White Paper, May 2007.

As a consultant for a leading provider of Java-.NET interoperability software, I wrote a white paper evangelizing the company’s IBM WebSphere Portal product, showing how it functions as a user-facing on-ramp to SOA.

Enterprise Semantics: Aligning Service Oriented Architecture with the Business,” with Joram Borenstein,Web Services Journal, May 2005.

A business-focused overview of the value that semantics bring to Service Oriented Architectures.

“Know What Your Schemas Mean: Semantic Information Management for XML Assets,”  XML Conference, Dec. 2003.

Schemas control the structure of information, But they don’t specify what a field means. Is that “salary” field monthly or annual? Semantic data management helps you keep track and avoid expensive mistakes.


Active Information Models for Data Transformation,” eAI Journal (later renamed to Business Integration Journal and Align Journal), May 2003.

EAI gives O(n) complexity for connecting n applications on the network, but there remains an O(n2) complexity for integrating the message formats that the applications use as input and output. With an ontology-based approach, however, this too can be reduced to O(n).

Semantic Discovery for Web Services,” with Joram Borenstein, Web Services Journal, April 2003.

Web Services lookup with UDDI requires client and server to agree on the exact syntax of the interaction. Using the principles of ontology, providers can publish and clients can discover Services based on the desired functionality rather than the syntactic details.

Generating XSLT with a Semantic Hub,”  XML Conference, Dec. 2002.

XLST was a promising XML technology that never fulfilled its promise because it was so hard to write and maintain. But when generated automatically from semantic information about what data is used for, XSLT becomes an automated information interchange language.

 

Software development 

Apache Spark and Java 8: The Big Data Team for 2015,” Datanami, 2014.

Apache Spark with Java 8 is proving to be the perfect match for Big Data.  In this article, I show an example of collaborative filtering using Spark on Cassandra data, and explain how much  easier this is to do with the lambdas of Java 8. Code to accompany it is here at GitHub.

JRuby on Rails,” JavaWorld, February 2007.

Exciting news about JRuby in late 2006: Sun hired the two lead developers a beta for Java 6 with built-in scripting API. Sun also announced plans to support dynamic languages on the JVM-level, and the JRuby team announced support for Rails.

The article explain Ruby on Rails to Java developers, comparing it to Java web frameworks. It presents an example based on JavaSpaces, which leverages Java from within the Rails application. Even those Java developers who do not adopt Rails will benefit from the design principles built into the framework, as well as the rapidly emerging concept of non-Java languages integrated with Java and the JVM.

Ruby in the Java World,” JavaWorld, July 2006.

Dynamic languages are rapidly gaining in popularity. Ruby in particular has attracted attention, with a big boost from the Ruby on Rails Web framework. In this article, I introduce Java programmers to Ruby, focusing on the similarities, differences, and connectivity between the two languages, and describing the value of JRuby on the Java-platform. The article got some buzz on the net, including from Frank Sommers at Artima.

Clojure: Challenge your Java Assumptions,” JavaWorld, May 2009.

The article is aimed at senior Java developers, encouraging them to learn more about this exciting language. A dialect of Lisp, Clojure runs on the JVM with excellent integration with Java, and provides new, improved solutions for the biggest challenge to programming languages today: concurrency.

Deploying Jini: HTTP Servers for the Dynamic Download of Code ,” Jiniology column, JavaWorld, Dec. 2001.

I’ve found that once new Jini developers learn about the exciting distributed architecture, they often get bogged down by the challenge of simply configuring their system for development. They encounter a yet greater challenge in moving from the development configuration to deployment. Even experienced developers can get confused by the variety of components involved.

In the article, I review a number of solutions and explain the advantages of various solutions such as ease of development,ease of migration from development to deployment, low memory and CPU burden, portability, compatibility with RMI Activation, security, and enterprise-class web-app features.

Building a Successful Wireless Web Site ,” Wireless Business and Technology, Apr. 2001.

If you’re a software development manager with experience leading the development of a three-tier distributed application for the World Wide Web, perhaps you’re about to move on to spearhead the construction of a WAP-site. This article has reuse as its theme: I explain when you can reuse skillsets, infrastructure, and software components from the WWW site, and when you’re better off developing new skills, buying new infrastructure, or building new software.

When is a Singleton not a Singleton,” JavaWorld, Jan. 2001.

The Sun Java Developer Connection reprinted the article, which was linked from the JDC front page for a while.Sometimes you implement the Singleton Design Pattern, but mysteriously find that more than one object of the class is instantiated. This article explains how that can happen and how to avoid it.

Opaque Bodies, Transparent Envelopes,” XML-Journal, Oct. 2000.

Separating layers of abstraction by packaging a body of one layer in an envelope of another layer is one of the fundamental design principles in data transfer. This principle holds for XML just as for any data transfer format, but implementing a system that observes layer separation can be difficult. This article describes how to do it.

So what is SO_KEEPALIVE?,” Dr. Dobb’s Journal, Sep. 2000.

Garbage collecting distributed leases requires mechanisms such as keep-alives, heartbeats, leases, and Are-You-There/I-Hear-You protocols. Interestingly, the keep-alive mechanism built into TCP/IP sockets is not really practical; for this reason and the JDK didn’t allow access to Socket Option Keep-Alive until the recent release of JDK 1.3. I explain the problems with SO_KEEPALIVE and how to implement your own garbage collection mechanism for distributed resources.

Collaborative Applications with the Java Shared Data Toolkit,” Dr. Dobb’s Journal, Feb. 2000.

I describe and review a toolkit for allowing distributed applications to share objects, and more generally discuss the challenges of managing distributed objects. The JSDT was an official product from Sun, although it never made it to the status of a Java extension. It implemented some interesting ideas for distributing objects. I enjoyed collaborating with the  creator, and some of my suggestions (like one on failure detection), actually made it into the toolkit.

Recruiting and Career