Example Log Data Repository

Use any of these files to try out and explore process mining on your own.

Please Note: You must be logged into the cloud in order to download these files.

Download FileDescription
COVID analyses: All PatientsBy Gender, and By Pre-existing Condition.

The Wolfram data repository posted Patient Medical Data for Novel Coronavirus COVID-19. The data include age, gender, location, date of onset, symptoms, travel history, chronic diseases, and various dates associated with symptoms, hospital admission, discharge, or death. We created three separate process minable event logs from these public data. Models such as these enable policy makers to: perform complex optimizations of rapidly-emerging medical technologies, optimize patient workflows, evaluate emerging best medical practices, model geographic spread, and agent-based simulations which estimate epidemiological parameters automatically for different patient demographics. This google slideshow reports our results. These visualizations compare process models between genders and patients with/without pre-existing conditions.

ENRON Emails: IEEE XES format, CSV format.Many think of processes as sequential, deliberate activities which sustain businesses and government agencies. From an ecosystem vantage, however, emergent processes exist and are discoverable. Emergent ecosystems form without human intention and may be especially influenceable. Tremendous intelligence is contained within semistructured and unstructured organizational data sources. Properly analyzed, these data provide government and private organizations with actionable management and risk mitigation insights. Data derived from this process model elucidates internal operations. This rendering of the process model is a prototype; arrow size and activity shading indicates relative values.
Apache Camel Emails: IEEE XES format, CSV format.Organizational workflow modeling is becoming increasingly an important capability from both a security and business productivity standpoint. These Apache Camel email event logs were created as part of a US government funded research project designed to further research into automated situation awareness of dynamically evolving events and the consequences of loss due to cybersecurity breaches.  This rendering of the process model is a prototype; arrow size and activity shading indicates relative values.
Smart Agriculture/crop GrowthSmart agriculture is a growing need. This dataset produces a temporal crop grow model of a hemp field. Multispectral drone image snapshots taken over a hemp field were analyzed with object identification AI, and relevant agriculture features were identified such as areas with soil, areas with water, healthy plants, and stressed plants. Images were converted into an event log by aligning the geographies from one image to the next to coincide with the same patch of ground. Our process mining algorithm models the temporal grow process. Here is another rendering of the process model: ag grow model.

Space Defense Region (SDR):
United States, RussiaChina, Japan

Radar Cross Section (RCS):
United StatesRussiaChina, Japan

Objects in space from four different countries may be examined from a process perspective using explainable artificial intelligence. For all countries, objects tend to remain predominantly in the same process activity state. Process activity state transitions (movement between orbital characteristic descriptive bins) are observed, however, suggesting intentional maneuver, object degradation, or other ecosystem behaviors.  Historical satellite operational data were provided by the Space Strategies Center. The data were collected from the CELESTRAK website in Two-Line Element Set format and converted into event logs. Data represent United States, Russian, Chinese, and Japanese Space Defense Region (SDR) and Radar Cross Section (RCS) processes. The date range of the data is from January 2012 to June 2013; observations for each space object were recorded approximately every two weeks or twice monthly–although this was not always the case.
PurchasingImagine you are the manager of a purchasing process and you have the following problems. First of all, you’re looking for ways to make the process more efficient. You don’t really know how you can do that but you have the feeling that there’s room for improvement in this process. This log file contains the time stamped activities for a purchasing process. For example, someone wants to buy a new computer. This is the requestor. He first needs to get approval of his manager that he can spend this money on a new computer. Afterwards, the request goes to the purchasing department, which will look for the best options. Afterwards, the computer will be ordered, and supplied by the supplier, and eventually, there’s an invoice that will be sent and paid through the financial department. Here is a prototype rendering of the model, as well.
Auto RepairIn this example, you are interested to understand the dynamics of your automobile repair process. Specifically, you want to know how you might provide better service and keep your valuable customers informed about the staus of their vehicle. Additionally, you’re interested in simulating labor changes or throughput adjustments before making changes to the eco system.
Call CenterAs an executive in charge of customer relations, you’re interested to understand the dynamics in your call center. Where do you observe excessive steps, which might indicate inefficiency? What some common traits associated with lengthy case resolutions. Are these opportunities to create data-driven KPIs, which will help drive your operations in the right direction while satisfying customers?
Download CSV templateUse this CSV template to create an event log using your own data. Download this template to your offline computing environment. Copy and paste your log event data in the proper format, save the file, go to the software import screen to import the data, and analyze for your process. See Help for additional documentation. Before uploading, be sure the file is sorted by Case ID and then by Start Time. It is best to use a programming language such as Python for sorting your data; Microsoft Office’s Excel program does not always sort data according to expected standards (for example: underscore characters).