CEA Forum

Data interexchange standards

Is there any existing work anywhere on defining data standards for growing data? If not, who wants to collaborate with me? :slightly_smiling_face:

I’m imagining something as simple as CSV with well-defined field names and data types, similar to Google’s General Transit Feed Specification (GTFS) for transit data.

We actually already have an open initiative for that. Please check out https://ceaod.github.io .

Review the current data format recommendations. And feel free to comment, provide suggestions, implement certain things, or contribute in other ways. Your contribution and leadership will be very welcome.

1 Like

Oh, I didn’t know about this. Fantastic!!

I see that the initiative is about promoting CEA data sharing for researchers in general. To facilitate data consumption by machine (e.g. for autonomous applications) but also cross-dataset research), however, it seems like it would be more valuable if the data guidelines define exact field names and data types for well-known columns. For example, Timestamp must be a string in RFC 3339 format; AmbientTemp_<SensorID> must be a float in Celsius. Otherwise the data guidelines are more of data packaging guidelines than about the data itself.

Also, since we’re already in the business of issuing data guidelines, I would simply insist on CSV format, for all the reasons already given on the webpage.

Perhaps after collecting more data from different sources, it will be possible to determine the most common fields and create a more formal specification. Though I suspect we already know enough to do this on some level. Should I go ahead and take a stab at this and do a PR?

Perhaps after collecting more data from different sources, it will be possible to determine the most common fields and create a more formal specification. Though I suspect we already know enough to do this on some level. Should I go ahead and take a stab at this and do a PR?

Sounds good. A pull/merge request is actually the best way to suggest a change.

Also keep in mind that we should balance the ease to share data and to consume data. At this stage, we are erring on the side of data sharing since the benefits or them are less obvious than the benefits for the ones consuming the data.