yammdb - just another .mmdb

13»

Comments

  • NeoonNeoon OG
    edited June 2023

    PSA: I forgot to add the new Locations, before the run started, will do a test run later. to compensate for that.

  • Is it possible to provide a .mmdb file in the format like this http://download.db-ip.com/free/dbip-city-lite-2023-06.mmdb.gz

  • @image_host said:
    Is it possible to provide a .mmdb file in the format like this http://download.db-ip.com/free/dbip-city-lite-2023-06.mmdb.gz

    By that my I guess is you mean other fields that we don't have filled in the .mmdb.
    The answer to this would be no, since I don't have data for it.

  • NeoonNeoon OG
    edited June 2023

    @someTom said:

    @Neoon said: I am to lazy and I don't wanna risk in accuracy because I fucked on the mapping or their was a routing issue.

    ah, i see. so no triangulation but rather finding the closest datacenter.

    i imagine using the latency of the ~three closest probes could be used to put the ip somewhere between them. but the calculation of course could be messed up by "non-straightforward" routing.

    I will include the geo cords from the closest 3 maybe 4.
    Just waiting for the current build to finish, so I can rebuild it with the cords.

    Thanked by (2)FrankZ Not_Oles
  • @Neoon said: I will include the geo cords from the closest 3 maybe 4.

    cool, i will definitley try if triangulation leads to anything useful.

  • @someTom said:

    @Neoon said: I will include the geo cords from the closest 3 maybe 4.

    cool, i will definitley try if triangulation leads to anything useful.

    I had to rewrite some of my code and how I write the .mmdb to make it work.
    Does it work? Yes however I already see potential issues.

    Will play around with the build for a bit.

  • The IP I queried is actually in France.
    However, with the 3 closest cords, niet.

    Thanked by (1)tuc
  • The best and easiest way is probably just taking the latency a draw a circle, instead of using triangulation.

  • did you release the additional geo cords + latencies somewhere? because in the files from june 20 i can only see 1 pair of coordinates per range.

  • @someTom said:
    did you release the additional geo cords + latencies somewhere? because in the files from june 20 i can only see 1 pair of coordinates per range.

    I was only testing it locally, hence not published yet.
    Triggered a manual rebuild with the code changes, should be there now.

    However, I probably remove it again at some point, since it does not seem to work that well.
    I put the cords under response.city.geoname_id

  • i see you added 3 coordinates per range, but for triangulation also the latencies of these locations would be necessary.

  • @someTom said:
    i see you added 3 coordinates per range, but for triangulation also the latencies of these locations would be necessary.

    I will add the latencies on the next build.
    One of the probes got involucrated, as soon its restored, the build shall begin.

  • The VPS which ran the prob was supposed to be restored on Friday, however it wasn't restored until Yesterday.
    So I won't do an additional build, just the regular Friday build.

    Thanked by (1)FrankZ
  • Recently I added a Preflight, to find issues before the build begins.
    Dallas had issues, so the Build was aborted.

    Restarted the build, Preflight now went through fine, Zurich and Paris added too.
    Its running now, a bit delayed today however.

    Thanked by (1)FrankZ
  • The build process crashed, it ran out of memory.
    159360 Killed

    I did already optimize some parts of the code, for less memory usage, however not this part.
    Its rewritten now, which reduced the memory usage quite a lot.

    The data is fine, just the final .mmdb build process crashed.
    I ran a rebuild, so the build is now available.

    Regarding the latency data for the coordinates, I need to do more changes in the code for that.
    I did not had the time yet, will do though.

    Thanked by (2)FrankZ someTom
  • I removed Stockholm last week, due to instability.
    However, I recently added London and Mumbai.

    More will follow.

    Thanked by (1)MallocVoidstar
  • I also replaced the Probe in Moscow with another one, since it seems that some ranges are not reachable from it.
    @someTom I will upload the raw latency data from each probe, on every build.

    I am going to focus on keeping the .mmdb small and compact as it used to be.
    The next build's won't include the additional geo coordinates anymore for a possible triangulation.

    If someone wants to build a modified .mmdb using the raw data from the probes, you can do so.

  • NeoonNeoon OG
    edited October 2023

    Since I found differences yesterday especially with Google, the build is done with around 2x more tests.
    Instead of 2.3 Million we do 4.5 Million, the .mmdb file is already compressed, however the .csv files will nearly double in size.

    The build should be done by Sunday, the recent Friday build got stock, due to a bunch of network issues.

    Thanked by (1)FrankZ
  • There is probably more delay on this.
    I adjusted the speed of the build, which apparently is hitting some limits.

    That caused 2 virtual servers to be "suspended", I can only guess since I have no info yet.
    There is active monitoring on each of the probes, so I can ensure, that CPU, Memory and I/O is within a reasonable limit.

    Usually we never hit more than 30% CPU usage.
    Despite that, they got suspended, on about 15% CPU usage.

    My best guess would be network, I increased the amount of pings in a batch.
    Which hits some kind of limit, that is not mentioned anywhere, shrugs.

    I updated the software, to remove probes if they have been unreachable for minutes.
    This should prevent any further delays.

    This build won't include Warsaw and partially only Singapore, depends if it gets suspended/stopped again.

    Thanked by (1)FrankZ
  • Some info on the build delays.
    Recently I changed the software to do at least one measurement per /24, if the prefix is bigger than a /24 e.g /20, we slice it into /24s and try to carry out at least one measurement per sliced /24.

    This should increase accuracy and fixes some of the issues where a Subnet was internally routed to different geographically locations.

    However, Google is Google, they like to slice specific Ranges into /26s.
    A good example for this, is the IP range used by google resolvers.

    Which if you use the data right now, points you to America, but the Range is also used in Europe.
    So I had to patch the software to allow individual slicing for ranges.

    The build is currently still running, just started a few hours ago, should be done by Sunday.

    Thanked by (1)FrankZ
  • Looks like, I have no chance other than also do IPv6.
    Since I got data on every single IPv4 prefix that has at least one pingable ip.

    I am going to use that data, to crosscheck, if this specific ASN has more than 1 geographically location.
    If it has, its getting ignored for now, same goes for IPv6 only networks.

    Every other network, where all data, that has been weekly measured, points to a single geographically location, I assume, the IPv6 prefixes are originating from that same location.

    No idea how reliable this is but for the beginning, should hopefully be good enough.
    If anyone got ideas on this, please lemme know.

    Scanning would be next on the list.

    Thanked by (2)FrankZ epsilun
  • I did decided to discontinue this Project, mainly because I was just curious, I wanted to learn how to build my own full fledged .mmdb based on latency data. However, I don't see myself running this in the long run.

    The raw masscan data will be still available weekly here https://raw.serv.app
    However, I won't build any .mmdb's anymore.
    If you wanna build your own .mmdb the code is available at https://github.com/Ne00n/latency-geolocator-4550

    Thanks to all the Sponsors.

    /thread

    Thanked by (3)Brueggus sh97 Decicus
This discussion has been closed.