Stream: implementers
Topic: filenames as Specimen identifiers - bad smell?
Andrew Patterson (Aug 30 2017 at 00:52):
In our system we have specimen resources that correspond with blood samples that have traditional lab identifiers etc.
Once they go through a DNA sequencer they end up represented as large gene files, on both the local sequencer computer and eventually remotely (i.e. in a S3 bucket). We need to store this filename information (the raw files themselves will never be FHIR resources - they are 10gb+). Because the filenames are particular to the internal working of our system - I was just going to add an extension to Specimen.
But it occurred to me last night that we already have a potential slot - as additional Specimen identifiers with a custom identifier systems. So system=http://mylab.com/localgenefile, value=abfgsd.fastq
Is this a horrifying thought - a blatant misuse of the identifier data type - the ramblings of a mad man? Or considered fair game for identifiers?
Grahame Grieve (Aug 30 2017 at 01:03):
file names share some characteristics with identifiers - they identify the file in a context. Whether they are suitable for use as identifiers depends on their managment. Note, in particular, the no re-use rule
Richard Townley-O'Neill (Aug 30 2017 at 04:25):
Is the sequence resource relevant to what you are proposing? http://build.fhir.org/sequence.html
Andrew Patterson (Aug 30 2017 at 05:50):
Yes - but the Sequence is more likely a post process artifact.. the main input files we are talking about here are raw 10gb+ text files representing raw genome data.. and they aren't going to go through our network/system at all. They will go direct from the lab to the cloud. So not important that they are represented in FHIR - we are just kind of using the Specimen as the proxy for each one. And need to know their filenames on all systems they go through.
Richard Townley-O'Neill (Aug 30 2017 at 06:21):
I did puzzle about putting the filename in identifier
for Specimen
, and in repository
for Sequence
. But that is probably correct.
Last updated: Apr 12 2022 at 19:14 UTC