Posted By: longzheng | Jul 10th @ 1:06 AM
page 2 of 2
Comments: 46 | Views: 845

But how do you deal with the fact that multiple files will have the same hash (basic rules of entropy)? It might start off ok, but eventually you need some way to say File X isn't the same as File Y, regardless of the fact they have the same hash.

blowdart
blowdart
Peek-a-boo

Actually a more sensible option IMO, rather than use two algorithms is to salt the hash with a unique value per file, say, a GUID. It depends what you think the hash is for - uniquely identifying a file, or for guaranteeing it hasn't changed since it was uploaded.

But in order to do that, the hash needs to have as many bits as the file. Sure, from a security point of view it's very difficult to intentionally generate a file with the same hash, but if you're trying to uniquely identify a file from potentially everything in existence it becomes a bit of a problem. Especially when your child downloads XXX material as opposed to the latest Disney cartoon....

It's not an issue for GUIDs so much because Angel the time factor makes it unlikely you'd regenerate the same one, (b) even if you did generate the same one again, it's unlikely you'd also be using it for the same purpose.

littleguru
littleguru
<3 Seattle

I still have a problem with identification. What if two files have the same hash and a bad guy wanted that to happen. You probably need to have both files up. Now if you validate you don't know which file it is.

The other thing is if you have two hashes for the same file. In that case a virus/malware could be identified as being something else because the calculated hash matches with one of these two hashes...

If I download a file from microsoft.com/download/ I'm sure that this one comes from Microsoft. There is little reasons to not believe that. If I validate a file with your service I'm never sure because it is the users who generate the data. There's a lot of noise going on... I'm not sure if something like that qualifies as a robust service!

exoteric
exoteric
I : Next<I>

True. That would completely defeat any purpose.

exoteric
exoteric
I : Next<I>

Just get the hash and find statements about the hash. If you don't find any conflicting statements on it or any negatives, then you may have some trust in it. If you do, then distrust it. No need to rely on one service's claims about a particular hash. Also, if you trust the provider, you could also have several users supply file metadata. I believe Bitzi has a tool that hashes and supplies metadata to the service. If so, then multiple reruns would provide statistical basis for trust. So you bind metadata to the context of a user. If a user is malicious, then all metadata from that user can be wiped out or hidden.

Sven Groot
Sven Groot
My name has 9 letters. Coincidence? I think not...

Using a combination of hash and file size would also decrease the odds of a collision, without needing extra hashes. If there is a collision there is no other option but to show all possible matches and let the user decide.

littleguru
littleguru
<3 Seattle

How hard will it be to compromise hashes, or file invalid hashes, or assign hashes to files they don't belong to?

blowdart
blowdart
Peek-a-boo

Ah no, lets be strict. It's not impossible, it's just very very improbable. There is a difference and all hash algorithms will collide at some point, it's in their nature.

Shareaza tried the same integration between http/torrent/emule/gnutella and other p2p network: the program itself is amazing and has one of the best GUIs ever seen, however this sort of integration failed miserably and never took off.

I think it's already a lost battle, if you want to make an useful project please make an adblocker for IE that doesn't suck.

TommyCarlier
TommyCarlier
I want my scalps!

Yesterday I watched a video of a presentation by Scott Hanselman and Phil Haack and Scott said something that is very appropriate here: “A system is very secure until it is not.”

You cannot tell that a system is 100% secure. It's secure until someone breaks it.

aL_
aL_
Rx ftw

isnt that how overnet/edonkey/emule works? there you have a hash representing the file and a bunch of chunks also with their own hashes, then you ask around (your peers/a central server) who has the hashes you want Smiley

exoteric
exoteric
I : Next<I>

Yes but with these odds, who really cares. I like Sven's idea, I've had the same thought myself but not sure it's worth the extra bits.

blowdart
blowdart
Peek-a-boo

True, but saying there's no possibility of collisions is 100% wrong.

 

blowdart
blowdart
Peek-a-boo

SHA1 collisions are at possible in 2^52 iterations right now. It's not easy, but it's doable.

page 2 of 2
Comments: 46 | Views: 845
Microsoft Communities