7.14.2009

Split the Data:Two are better than one.


Musings on massive (def: >100 mb) BIM datasets and a strategy for attribute seek optimization.


In working with a massive dataset on Revit (4 180+mb files, 1 120+mb, and 1 80+mb structural file, compressed), I have had the opportunity to observe the impacts to productivity and inefficiency of attributal seeking in the context of a large (+40 person) team.

Not good.

5+ minute load times. Multi-minute lag on certain commands. View load times. Save times. Reload latest. All of these add up to both actual and perceived inefficiencies in a BIM workflow. While the central database model offers many benefits, the question is how to improve on this?

While undergoing an experiment on a Linux CAVE system, I had the opportunity to work with a researcher writing his own modelling application. Our dataset was a +1gig ASCII file, and in doing manual edits on the dataset (ironically, on a homemade text editor), I had the opportunity to witness a search algorithm that was able to break the data into small batch sets, resulting in a seek time faster than a standard unix word count algorithm.

So, if it is possible to subdivide and partition the dataset in a way that allows for localized seeking, is it possible to modularize the data in a way that begins to prioritize a more efficient user experience?

This is an engineering question. Similar to current battery research, as batteries maxed out their golden ratio of weight to charge hold times, engineers have now had to begin studying ways to separate battery usage into a more efficient division based on the tasks that are being asked of it in order to extend lifespans while decreasing weight. If this same strategy is employed on a dataset, is it possible to separate graphic information from quick access from deep access data?

IE: Splitting apart the data when generated into clusters so as I access views, that information is both optimized in a location to speed graphic population and redraw on the GPU, as well as intelligent streaming caching to take away the generation lag, but not bogging it down anywhere else in the dataset? This could also allow for a separation of types of data, from representation vs. attributes, in a way that would allow a structural package to optimize data it is loading differently than an MEP package differently than an architectural package.

Combined in a cloud scenario, an optimized database manager can be loaded an run in parallel to the BIM app. The dangerous side of this would allow current modeling only applications such as SketchUp to, in parallel, develop an attribute database system that plugs into their software, giving them an instant competitive entry to the game.

This is what keeps life interesting. Carry on, diisssmissed!

No comments: